Methods for identifying compounds of interest using encoded libraries

ABSTRACT

The present invention provides a method for identifying a compound of interest by screening libraries of molecules which include an encoding oligonucleotide tag.

RELATED APPLICATIONS

This application claims priority to U.S. Provisional Application No.60/731,464, filed Oct. 28, 2005. This application is related to U.S.Patent Application No. 60/689,466, filed Jun. 9, 2005, pending, and U.S.patent application Ser. No. 11/015458 filed Dec. 17, 2004. Thisapplication is also related to U.S. Provisional Patent Application Ser.No. 60/530,854, filed on Dec. 17, 2003; U.S. Provisional PatentApplication Ser. No. 60/540,681, filed on Jan. 30, 2004; U.S.Provisional Patent Application Ser. No. 60/553,715 filed Mar. 15, 2004;and U.S. Provisional Patent Application Ser. No. 60/588,672 filed Jul.16, 2004. The entire contents of each of the foregoing applications areincorporated herein by reference.

BACKGROUND OF THE INVENTION

The search for more efficient methods of identifying compounds havinguseful biological activities has led to the development of methods forscreening vast numbers of distinct compounds, present in collectionsreferred to as combinatorial libraries. Such libraries can include 10⁵or more distinct compounds. A variety of methods exist for producingcombinatorial libraries, and combinatorial syntheses of peptides,peptidomimetics and small organic molecules have been reported.

The two major challenges in the use of combinatorial approaches in drugdiscovery are the synthesis of libraries of sufficient complexity andthe identification of molecules which are active in the screens used. Itis generally acknowledged that greater the degree of complexity of alibrary, i.e., the number of distinct structures present in the library,the greater the probability that the library contains molecules with theactivity of interest. Therefore, the chemistry employed in librarysynthesis must be capable of producing vast numbers of compounds withina reasonable time frame. However, for a given formal or overallconcentration, increasing the number of distinct members within thelibrary lowers the concentration of any particular library member. Thiscomplicates the identification of active molecules from high complexitylibraries.

One approach to overcoming these obstacles has been the development ofencoded libraries, and particularly libraries in which each compoundincludes an amplifiable tag. Such libraries include DNA-encodedlibraries, in which a DNA tag identifying a library member can beamplified using techniques of molecular biology, such as the polymerasechain reaction. However, the use of such methods for producing verylarge libraries is yet to be demonstrated, and it is clear that improvedmethods for producing such libraries are required for the realization ofthe potential of this approach to drug discovery.

SUMMARY OF THE INVENTION

Traditional drug discovery methods have relied on multi-step selectionprocesses, often involving the amplification (e.g., PCR amplification)of nucleic acid molecules, and the sequencing of up to 1,000 or more ofthe top clones. This multi-step selection process and the nucleic acidamplification often lead to the introduction of many biases (asdiscussed in, for example, Holt, L. J., et al. (2000) Nucleic Acids Res.28(15):E72). The presence of these biases typically leads to theselection of compounds that lack the desired biological activity.

The present invention provides improved methods as compared to the priorart methods in that it provides methods which eliminate the foregoingbiases. For example, the present invention provides methods ofidentifying a compound of interest using a massively parallel sequencingapproach which leads to the accurate identification of a compound with adesired biological activity using fewer selection steps. Moreover, asdescribed herein, a unique tagging system has been developed thateliminates biases introduced by nucleic acid amplification, e.g., PCRamplification. In addition, the methods described herein allow for anexpansive and extensive analysis of the selected compounds having adesired biological property, which, in turn, allows for relatedcompounds with familial structural relationships to be identified(structure activity relationships). In summary, using the methods of theinvention, a single step selection/enrichment cycle can be performed andthen sequencing can be performed at the single molecule level,preferably without the need for any nucleic acid amplification.

Accordingly, in one aspect, the invention provides a method foridentifying one or more compounds which bind to a biological target. Themethod comprises synthesizing a library of compounds, wherein thecompounds comprise a functional moiety comprising two or more buildingblocks which is operatively linked to an initial oligonucleotide whichidentifies the structure of the functional moiety by providing asolution comprising m initiator compounds, wherein m is an integer of 1or greater, where the initiator compounds consist of a functional moietycomprising n building blocks, where n is an integer of 1 or greater,which is operatively linked to an initial oligonucleotide whichidentifies the n building blocks, dividing the solution described aboveinto r reaction vessels, wherein r is an integer of 2 or greater,thereby producing r aliquots of the solution, reacting the initiatorcompounds in each reaction vessel with one of r building blocks, therebyproducing r aliquots comprising compounds consisting of a functionalmoiety comprising n+1 building blocks operatively linked to the initialoligonucleotide, and reacting the initial oligonucleotide in eachaliquot with one of a set of r distinct incoming oligonucleotides in thepresence of an enzyme which catalyzes the ligation of the incomingoligonucleotide and the initial oligonucleotide, under conditionssuitable for enzymatic ligation of the incoming oligonucleotide and theinitial oligonucleotide; thereby producing r aliquots of moleculesconsisting of a functional moiety comprising n+1 building blocksoperatively linked to an elongated oligonucleotide which encodes the n+1building blocks; contacting the biological target with the library ofcompounds, or a portion thereof, under conditions suitable for at leastone member of the library of compounds to bind to the target, removinglibrary members that do not bind to the target, sequencing the encodingoligonucleotides of the at least one member of the library of compoundswhich binds to the target, and using the foregoing sequences todetermine the structure of the functional moieties of the members of thelibrary of compounds which bind to the biological target, therebyidentifying one or more compounds which bind to the biological target.

In one embodiment, the methods of the invention may further compriseamplifying the encoding oligonucleotide of the at least one member ofthe library of compounds which binds to the target prior to sequencing.

In one embodiment, the method of amplifying comprises forming awater-in-oil emulsion to create a plurality of aqueous microreactors,wherein at least one of the microreactors comprises the at least onemember of the library of compounds that binds to the target, a singlebead capable of binding to the encoding oligonucleotide of the at leastone member of the library of compounds that binds to the target, andamplification reaction solution containing reagents necessary to performnucleic acid amplification, amplifying the encoding oligonucleotide inthe microreactors to form amplified copies of the encodingoligonucleotide, and binding the amplified copies of the encodingoligonucleotide to the beads in the microreactors.

In one embodiment, the method of sequencing comprises annealing aneffective amount of a sequencing primer to the amplified copies of theencoding oligonucleotide and extending the sequencing primer with apolymerase and a predetermined nucleotide triphosphate to yield asequencing product and, if the predetermined nucleotide triphosphate isincorporated onto a 3′ end of the sequencing primer, a sequencingreaction byproduct, and identifying the sequencing reaction byproduct,thereby determining the sequence of the encoding oligonucleotide.

In one embodiment, sequencing is performed using the polymerase chainreaction. In another embodiment, sequencing is performed using apyrophosphate sequencing method or using a single molecule sequencing bysynthesis method.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic representation of ligation of double strandedoligonucleotides, in which the initial oligonucleotide has an overhangwhich is complementary to the overhang of the incoming oligonucleotide.The initial strand is represented as either free, conjugated to anaminohexyl linker or conjugated to a phenylalanine residue via anaminohexyl linker.

FIG. 2 is a schematic representation of oligonucleotide ligation using asplint strand. In this embodiment, the splint is a 12-meroligonucleotide with sequences complementary to the single-strandedinitial oligonucleotide and the single-stranded incomingoligonucleotide.

FIG. 3 is a schematic representation of ligation of an initialoligonucleotide and an incoming oligonucleotide, when the initialoligonucleotide is double-stranded with covalently linked strands, andthe incoming oligonucleotide is double-stranded.

FIG. 4 is a schematic representation of oligonucleotide elongation usinga polymerase. The initial strand is represented as either free,conjugated to an aminohexyl linker or conjugated to a phenylalanineresidue via an aminohexyl linker.

FIG. 5 is a schematic representation of the synthesis cycle of oneembodiment of the invention.

FIG. 6 is a schematic representation of a multiple round selectionprocess using the libraries of the invention.

FIG. 7 is a gel resulting from electrophoresis of the products of eachof cycles 1 to 5 described in Example 1 and following ligation of theclosing primer. Molecular weight standards are shown in lane 1, and theindicated quantities of a hyperladder, for DNA quantitation, are shownin lanes 9 to 12.

FIG. 8 is a schematic depiction of the coupling of building blocks usingazide-alkyne cycloaddition.

FIGS. 9 and 10 illustrate the coupling of building blocks vianucleophilic aromatic substitution on a chlorinated triazine.

FIG. 11 shows representative chlorinated heteroaromatic structuressuitable for use in the synthesis of functional moieties.

FIG. 12 illustrates the cyclization of a linear peptide using theazide/alkyne cycloaddition reaction.

FIG. 13 a is a chromatogram of the library produced as described inExample 2 follwing Cycle 4.

FIG. 13 b is a mass spectrum of the library produced as described inExample 2 following Cycle 4.

DETAILED DESCRIPTION OF THE INVENTION

The present invention relates to methods of producing compounds andcombinatorial compound libraries, the compounds and libraries producedvia the methods of the invention, and methods of using the libraries toidentify compounds having a desired property, such as a desiredbiological activity. The invention further relates to the compoundsidentified using these methods.

A variety of approaches have been taken to produce and screencombinatorial chemical libraries. Examples include methods in which theindividual members of the library are physically separated from eachother, such as when a single compound is synthesized in each of amultitude of reaction vessels. However, these libraries are typicallyscreened one compound at a time, or at most, several compounds at a timeand do not, therefore, result in the most efficient screening process.In other methods, compounds are synthesized on solid supports. Suchsolid supports include chips in which specific compounds occupy specificregions of the chip or membrane (“position addressable”). In othermethods, compounds are synthesized on beads, with each bead containing adifferent chemical structure.

Two difficulties that arise in screening large libraries are (1) thenumber of distinct compounds that can be screened; and (2) theidentification of compounds which are active in the screen. In onemethod, the compounds which are active in the screen are identified bynarrowing the original library into ever smaller fractions andsubfractions, in each case selecting the fraction or subfraction whichcontains active compounds and further subdividing until attaining anactive subfraction which contains a set of compounds which issufficiently small that all members of the subset can be individuallysynthesized and assessed for the desired activity. This is a tedious andtime consuming activity.

Another method of deconvoluting the results of a combinatorial libraryscreen is to utilize libraries in which the library members are taggedwith an identifying label, that is, each label present in the library isassociated with a discreet compound structure present in the library,such that identification of the label tells the structure of the taggedmolecule. One approach to tagged libraries utilizes oligonucleotidetags, as described, for example, in U.S. Pat. Nos. 5,573,905; 5,708,153;5,723,598, 6,060,596 published PCT applications WO 93/06121; WO93/20242; WO 94/13623; WO 00/23458; WO 02/074929 and WO 02/103008, andby Brenner and Lerner (Proc. Natl. Acad. Sci. USA 89, 5381-5383 (1992);Nielsen and Janda (Methods: A Companion to Methods in Enzymology 6,361-371 (1994); and Nielsen, Brenner and Janda (J. Am. Chem. Soc. 115,9812-9813 (1993)), each of which is incorporated herein by reference inits entirety. Such tags can be amplified, using for example, polymerasechain reaction, to produce many copies of the tag and identify the tagby sequencing. The sequence of the tag then identifies the structure ofthe binding molecule, which can be synthesized in pure form and tested.To date, there has been no report of the use of the methodologydisclosed by Lerner et al. to prepare large libraries. The presentinvention provides an improvement in methods to produce DNA-encodedlibraries, as well as the first examples of large (10⁵ members orgreater) libraries of DNA-encoded molecules in which the functionalmoiety is synthesized using solution phase synthetic methods.

The present invention provides methods which enable facile synthesis ofoligonucleotide-encoded combinatorial libraries, and permit anefficient, high-fidelity means of adding such an oligonucleotide tag toeach member of a vast collection of molecules.

The methods of the invention include methods for synthesizingbifunctional molecules which comprise a first moiety (“functionalmoiety”) which is made up of building blocks, and a second moietyoperatively linked to the first moiety, comprising an oligonucleotidetag which identifies the structure of the first moiety, i.e., theoligonucleotide tag indicates which building blocks were used in theconstruction of the first moiety, as well as the order in which thebuilding blocks were linked. Generally, the information provided by theoligonucleotide tag is sufficient to determine the building blocks usedto construct the active moiety. In certain embodiments, the sequence ofthe oligonucleotide tag is sufficient to determine the arrangement ofthe building blocks in the functional moiety, for example, for peptidicmoieties, the amino acid sequence.

The term “functional moiety” as used herein, refers to a chemical moietycomprising one or more building blocks. Preferably, the building blocksin the functional moiety are not nucleic acids. The functional moietycan be a linear or branched or cyclic polymer or oligomer or a smallorganic molecule.

The term “building block”, as used herein, is a chemical structural unitwhich is linked to other chemical structural units or can be linked toother such units. When the functional moiety is polymeric or oligomeric,the building blocks are the monomeric units of the polymer or oligomer.Building blocks can also include a scaffold structure (“scaffoldbuilding block”) to which is, or can be, attached one or more additionalstructures (“peripheral building blocks”).

It is to be understood that the term “building block” is used herein torefer to a chemical structural unit as it exists in a functional moietyand also in the reactive form used for the synthesis of the functionalmoiety. Within the functional moiety, a building block will existwithout any portion of the building block which is lost as a consequenceof incorporating the building block into the functional moiety. Forexample, in cases in which the bond-forming reaction releases a smallmolecule (see below), the building block as it exists in the functionalmoiety is a “building block residue”, that is, the remainder of thebuilding block used in the synthesis following loss of the atoms that itcontributes to the released molecule.

The building blocks can be any chemical compounds which arecomplementary, that is the building blocks must be able to reacttogether to form a structure comprising two or more building blocks.Typically, all of the building blocks used will have at least tworeactive groups, although it is possible that some of the buildingblocks (for example the last building block in an oligomeric functionalmoiety) used will have only one reactive group each. Reactive groups ontwo different building blocks should be complementary, i.e., capable ofreacting together to form a covalent bond, optionally with theconcomitant loss of a small molecule, such as water, HCl, HF, and soforth.

For the present purposes, two reactive groups are complementary if theyare capable of reacting together to form a covalent bond. In a preferredembodiment, the bond forming reactions occur rapidly under ambientconditions without substantial formation of side products. Preferably, agiven reactive group will react with a given complementary reactivegroup exactly once. In one embodiment, complementary reactive groups oftwo building blocks react, for example, via nucleophilic substitution,to form a covalent bond. In one embodiment, one member of a pair ofcomplementary reactive groups is an electrophilic group and the othermember of the pair is a nucleophilic group.

Complementary electrophilic and nucleophilic groups include any twogroups which react via nucleophilic substitution under suitableconditions to form a covalent bond. A variety of suitable bond-formingreactions are known in the art. See, for example, March, AdvancedOrganic Chemistry, fourth edition, New York: John Wiley and Sons (1992),Chapters 10 to 16; Carey and Sundberg, Advanced Organic Chemistry, PartB, Plenum (1990), Chapters 1-11; and Collman et al., Principles andApplications of Organotransition Metal Chemistry, University ScienceBooks, Mill Valley, Calif. (1987), Chapters 13 to 20; each of which isincorporated herein by reference in its entirety. Examples of suitableelectrophilic groups include reactive carbonyl groups, such as acylchloride groups, ester groups, including carbonyl pentafluorophenylesters and succinimide esters, ketone groups and aldehyde groups;reactive sulfonyl groups, such as sulfonyl chloride groups, and reactivephosphonyl groups. Other electrophilic groups include terminal epoxidegroups, isocyanate groups and alkyl halide groups. Suitable nucleophilicgroups include primary and secondary amino groups and hydroxyl groupsand carboxyl groups.

Suitable complementary reactive groups are set forth below. One of skillin the art can readily determine other reactive group pairs that can beused in the present method, and the examples provided herein are notintended to be limiting.

In a first embodiment, the complementary reactive groups includeactivated carboxyl groups, reactive sulfonyl groups or reactivephosphonyl groups, or a combination thereof, and primary or secondaryamino groups. In this embodiment, the complementary reactive groupsreact under suitable conditions to form an amide, sulfonamide orphosphonamidate bond.

In a second embodiment, the complementary reactive groups includeepoxide groups and primary or secondary amino groups. Anepoxide-containing building block reacts with an amine-containingbuilding block under suitable conditions to form a carbon-nitrogen bond,resulting in a β-amino alcohol.

In another embodiment, the complementary reactive groups includeaziridine groups and primary or secondary amino groups. Under suitableconditions, an aziridine-containing building block reacts with anamine-containing building block to form a carbon-nitrogen bond,resulting in a 1,2-diamine. In a third embodiment, the complementaryreactive groups include isocyanate groups and primary or secondary aminogroups. An isocyanate-containing building block will react with anamino-containing building block under suitable conditions to form acarbon-nitrogen bond, resulting in a urea group.

In a fourth embodiment, the complementary reactive groups includeisocyanate groups and hydroxyl groups. An isocyanate-containing buildingblock will react with an hydroxyl-containing building block undersuitable conditions to form a carbon-oxygen bond, resulting in acarbamate group.

In a fifth embodiment, the complementary reactive groups include aminogroups and carbonyl-containing groups, such as aldehyde or ketonegroups. Amines react with such groups via reductive amination to form anew carbon-nitrogen bond.

In a sixth embodiment, the complementary reactive groups includephosphorous ylide groups and aldehyde or ketone groups. Aphosphorus-ylide-containing building block will react with an aldehydeor ketone-containing building block under suitable conditions to form acarbon-carbon double bond, resulting in an alkene.

In a seventh embodiment, the complementary reactive groups react viacycloaddition to form a cyclic structure. One example of suchcomplementary reactive groups are alkynes and organic azides, whichreact under suitable conditions to form a triazole ring structure. Anexample of the use of this reaction to link two building blocks isillustrated in FIG. 8. Suitable conditions for such reactions are knownin the art and include those disclosed in WO 03/101972, the entirecontents of which are incorporated by reference herein.

In an eighth embodiment, the complementary reactive groups are an alkylhalide and a nucleophile, such as an amino group, a hydroxyl group or acarboxyl group. Such groups react under suitable conditions to form acarbon-nitrogen (alkyl halide plus amine) or carbon oxygen (alkyl halideplus hydroxyl or carboxyl group).

In a ninth embodiment, the complementary functional groups are ahalogenated heteroaromatic group and a nucleophile, and the buildingblocks are linked under suitable conditions via aromatic nucleophilicsubstitution. Suitable halogenated heteroaromatic groups includechlorinated pyrimidines, triazines and purines, which react withnucleophiles, such as amines, under mild conditions in aqueous solution.Representative examples of the reaction of an oligonucleotide-taggedtrichlorotriazine with amines are shown in FIGS. 9 and 10. Examples ofsuitable chlorinated heteroaromatic groups are shown in FIG. 11.

Additional bond-forming reactions that can be used to join buildingblocks in the synthesis of the molecules and libraries of the inventioninclude those shown below. The reactions shown below emphasize thereactive functional groups. Various substituents can be present in thereactants, including those labeled R₁, R₂, R₃ and R₄. The possiblepositions which can be substituted include, but are not limited, tothose indicated by R₁, R₂, R₃ and R₄. These substituents can include anysuitable chemical moieties, but are preferably limited to those whichwill not interfere with or significantly inhibit the indicated reaction,and, unless otherwise specified, can include hydrogen, alkyl,substituted alkyl, aryl, substituted aryl, heteroaryl, substitutedheteroaryl, alkoxy, aryloxy, arylalkyl, substituted arylalkyl, amino,substituted amino and others as are known in the art. Suitablesubstituents on these groups include alkyl, aryl, heteroaryl, cyano,halogen, hydroxyl, nitro, amino, mercapto, carboxyl, and carboxamide.Where specified, suitable electron-withdrawing groups include nitro,carboxyl, haloalkyl, such as trifluoromethyl and others as are known inthe art. Examples of suitable electron-donating groups include alkyl,alkoxy, hydroxyl, amino, halogen, acetamido and others as are known inthe art.Addition of a primary amine to an alkene:

Nucleophilic substitution:

Reductive alkylation of an amine:

Palladium catalyzed carbon-carbon bond forming reactions:

Ugi condensation reactions:

Electrophilic aromatic substitution reactions:

X is an electron-donating group.Imine/iminium/enamine forming reactions:

Cycloaddition reactions:

Diels-Alder cycloaddition

1,3-dipolar cycloaddition, X—Y-Z=C—N—O, C—N—S, N₃,Nucleophilic aromatic substitution reactions:

W is an electron withdrawing group

Examples of suitable substituents X and Y include substituted orunsubstituted amino, substituted or unsubstituted alkoxy, substituted orunsubstituted thioalkoxy, substituted or unsubstituted aryloxy andsubstituted and unsubstituted thioaryloxy.

Heck reaction:

Acetal formation:

Examples of suitable substituents X and Y include substituted andunsubstituted amino, hydroxyl and sulhydryl; Y is a linker that connectsX and Y and is suitable for forming the ring structure found in theproduct of the reactionAldol reactions:

Examples of suitable substituents X include O, S and NR₃.

Scaffold building blocks which can be used to form the molecules andlibraries of the invention include those which have two or morefunctional groups which can participate in bond forming reactions withperipheral building block precursors, for example, using one or more ofthe bond forming reactions discussed above. Scaffold moieties may alsobe synthesized during construction of the libraries and molecules of theinvention, for example, using building block precursors which can reactin specific ways to form molecules comprising a central molecular moietyto which are appended peripheral functional groups. In one embodiment, alibrary of the invention comprises molecules comprising a constantscaffold moiety, but different peripheral moieties or differentarrangements of peripheral moieties. In certain libraries, all librarymembers comprise a constant scaffold moiety; other libraries cancomprise molecules having two or more different scaffold moieties.Examples of scaffold moiety-forming reactions that can be used in theconstruction of the molecules and libraries of the invention are setforth in the Table. The references cited in the table are incorporatedherein by reference in their entirety. The groups R₁, R₂, R₃ and R4 arelimited only in that they should not interfere with, or significantlyinhibit, the indicated reaction, and can include hydrogen, alkyl,substituted alkyl, heteroalkyl, substituted heteroalkyl, cycloalkyl,heterocycloalkyl, substituted cycloalkyl, substituted heterocycloalkyl,aryl, substituted aryl, arylalkyl, heteroarylalkyl, substitutedarylalkyl, substituted heteroarylalkyl, heteroaryl, substitutedheteroaryl, halogen, alkoxy, aryloxy, amino, substituted amino andothers as are known in the art. Suitable substituents include, but arenot limited to, alkyl, alkoxy, thioalkoxy, nitro, hydroxyl, sulfhydryl,aryloxy, aryl-S—, halogen, carboxy, amino, alkylamino, dialkylamino,arylamino, cyano, cyanate, nitrile, isocyanate, thiocyanate, carbamyl,and substituted carbamyl.

It is to be understood that the synthesis of a functional moiety canproceed via one particular type of coupling reaction, such as, but notlimited to, one of the reactions discussed above, or via a combinationof two or more coupling reactions, such as two or more of the couplingreactions discussed above. For example, in one embodiment, the buildingblocks are joined by a combination of amide bond formation (amino andcarboxylic acid complementary groups) and reductive amination (amino andaldehyde or ketone complementary groups). Any coupling chemistry can beused, provided that it is compatible with the presence of anoligonucleotide. Double stranded (duplex) oligonucleotide tags, as usedin certain embodiments of the present invention, are chemically morerobust than single stranded tags, and, therefore, tolerate a broaderrange of reaction conditions and enable the use of bond-formingreactions that would not be possible with single-stranded tags.

A building block can include one or more functional groups in additionto the reactivem group or groups employed to form the functional moiety.One or more of these additional functional groups can be protected toprevent undesired reactions of these functional groups. Suitableprotecting groups are known in the art for a variety of functionalgroups (Greene and Wuts, Protective Groups in Organic Synthesis, secondedition, New York: John Wiley and Sons (1991), incorporated herein byreference). Particularly useful protecting groups include t-butyl estersand ethers, acetals, trityl ethers and amines, acetyl esters,trimethylsilyl ethers,trichloroethyl ethers and esters and carbamates.

In one embodiment, each building block comprises two reactive groups,which can be the same or different. For example, each building blockadded in cycle s can comprise two reactive groups which are the same,but which are both complementary to the reactive groups of the buildingblocks added at steps s−1 and s+1. In another embodiment, each buildingblock comprises two reactive groups which are themselves complementary.For example, a library comprising polyamide molecules can be producedvia reactions between building blocks comprising two primary aminogroups and building blocks comprising two activated carboxyl groups. Inthe resulting compounds there is no N— or C-terminus, as alternate amidegroups have opposite directionality. Alternatively, a polyamide librarycan be produced using building blocks that each comprise an amino groupand an activated carboxyl group. In this embodiment, the building blocksadded in step n of the cycle will have a free reactive group which iscomplementary to the available reactive group on the n−1 building block,while, preferably, the other reactive group on the nth building block isprotected. For example, if the members of the library are synthesizedfrom the C to N direction, the building blocks added will comprise anactivated carboxyl group and a protected amino group.

The functional moieties can be polymeric or oligomeric moieties, such aspeptides, peptidomimetics, peptide nucleic acids or peptoids, or theycan be small non-polymeric molecules, for example, molecules having astructure comprising a central scaffold and structures arranged aboutthe periphery of the scaffold. Linear polymeric or oligomeric librarieswill result from the use of building blocks having two reactive groups,while branched polymeric or oligomeric libraries will result from theuse of building blocks having three or more reactive groups, optionallyin combination with building blocks having only two reactive groups.Such molecules can be represented by the general formula X₁X₂. . .X_(n), where each X is a monomeric unit of a polymer comprising nmonomeric units, where n is an integer greater than 1 In the case ofoligomeric or polymeric compounds, the terminal building blocks need notcomprise two functional groups. For example, in the case of a polyamidelibrary, the C-terminal building block can comprise an amino group, butthe presence of a carboxyl group is optional. Similarly, the buildingblock at the N-terminus can comprise a carboxyl group, but need notcontain an amino group.

Branched oligomeric or polymeric compounds can also be synthesizedprovided that at least one building block comprises three functionalgroups which are reactive with other building blocks. A library of theinvention can comprise linear molecules, branched molecules or acombination thereof.

Libraries can also be constructed using, for example, a scaffoldbuilding block having two or more reactive groups, in combination withother building blocks having only one available reactive group, forexample, where any additional reactive groups are either protected ornot reactive with the other reactive groups present in the scaffoldbuilding block. In one embodiment, for example, the moleculessynthesized can be represented by the general formula X(Y)_(n), where Xis a scaffold building block; each Y is a building block linked to X andn is an integer of at least two, and preferably an integer from 2 toabout 6. In one preferred embodiment, the initial building block ofcycle 1 is a scaffold building block. In molecules of the formulaX(Y)_(n), each Y can be the same or different, but in most members of atypical library, each Y will be different.

In one embodiment, the libraries of the invention comprise polyamidecompounds. The polyamide compounds can be composed of building blocksderived from any amino acids, including the twenty naturally occurringα-amino acids, such as alanine (Ala; A), glycine (Gly; G), asparagine(Asn; N), aspartic acid (Asp; D), glutamic acid (Glu; E), histidine(His; H), leucine (Leu; L), lysine (Lys; K), phenylalanine (Phe; F),tyrosine (Tyr; Y), threonine (Thr; T), serine (Ser; S), arginine (Arg;R), valine (Val; V), glutamine (Gln; Q), isoleucine (Ile; I), cysteine(Cys; C), methionine (Met; M), proline (Pro; P) and tryptophan (Trp; W),where the three-letter and one-letter codes for each amino acid aregiven. In their naturally occurring form, each of the foregoing aminoacids exists in the L-configuration, which is to be assumed hereinunless otherwise noted. In the present method, however, theD-configuration forms of these amino acids can also be used. TheseD-amino acids are indicated herein by lower case three- or one-lettercode, i.e., ala (a), gly (g), leu (l), gln (q), thr (t), ser (s), and soforth. The building blocks can also be derived from other α-amino acids,including, but not limited to, 3-arylalanines, such as naphthylalanine,phenyl-substituted phenylalanines, including 4-fluoro-, 4-chloro,4-bromo and 4-methylphenylalanine; 3-heteroarylalanines, such as3-pyridylalanine, 3-thienylalanine, 3-quinolylalanine, and3-imidazolylalanine; ornithine; citrulline; homocitrulline; sarcosine;homoproline; homocysteine; substituted proline, such as hydroxyprolineand fluoroproline; dehydroproline; norleucine; O-methyltyrosine;O-methylserine; O-methylthreonine and 3-cyclohexylalanine. Each of thepreceding amino acids can be utilized in either the D- orL-configuration.

The building blocks can also be amino acids which are not a-amino acids,such as =60 -azaamino acids; β, γ, δ, ε,-amino acids, and N-substitutedamino acids, such as N-substituted glycine, where the N-substituent canbe, for example, a substituted or unsubstituted alkyl, aryl, heteroaryl,arylalkyl or heteroarylalkyl group. In one embodiment, the N-substituentis a side chain from a naturally-occurring or non-naturally occurringα-amino acid.

The building block can also be a peptidomimetic structure, such as adipeptide, tripeptide, tetrapeptide or pentapeptide mimetic. Suchpeptidomimetic building blocks are preferably derived from amino acylcompounds, such that the chemistry of addition of these building blocksto the growing poly(aminoacyl) group is the same as, or similar to, thechemistry used for the other building blocks. The building blocks canalso be molecules which are capable of forming bonds which are isostericwith a peptide bond, to form peptidomimetic functional moietiescomprising a peptide backbone modification, such as ψ[CH₂S], ψ[CH₂NH],ψ[CSNH₂], ψ[NHCO], ψ[COCH₂], and ψ[(E) or (Z)CH═CH]. In the nomenclatureused above, ψ indicates the absence of an amide bond. The structure thatreplaces the amide group is specified within the brackets.

In one embodiment, the invention provides a method of synthesizing acompound comprising or consisting of a functional moiety which isoperatively linked to an encoding oligonucleotide. The method includesthe steps of: (1) providing an initiator compound consisting of aninitial functional moiety comprising n building blocks, where n is aninteger of 1 or greater, wherein the initial functional moiety comprisesat least one reactive group, and wherein the initial functional moietyis operatively linked to an initial oligonucleotide which encodes the nbuilding blocks; (2) reacting the initiator compound with a buildingblock comprising at least one complementary reactive group, wherein theat least one complementary reactive group is complementary to thereactive group of step (1), under suitable conditions for reaction ofthe reactive group and the complementary reactive group to form acovalent bond; (3) reacting the initial oligonucleotide with an incomingoligonucleotide in the presence of an enzyme which catalyzes ligation ofthe initial oligonucleotide and the incoming oligonucleotide, underconditions suitable for ligation of the incoming oligonucleotide and theinitial oligonucleotide, thereby producing a molecule which comprises orconsists of a functional moiety comprising n+1 building blocks which isoperatively linked to an encoding oligonucleotide. If the functionalmoiety of step (3) comprises a reactive group, steps 1-3 can be repeatedone or more times, thereby forming cycles 1 to i, where i is an integerof 2 or greater, with the product of step (3) of a cycle s−1, where s isan integer of i or less, becoming the initiator compound of step (1) ofcycle s. In each cycle, one building block is added to the growingfunctional moiety and one oligonucleotide sequence, which encodes thenew building block, is added to the growing encoding oligonucleotide.

In one embodiment, the initial initiator compound(s) is generated byreacting a first building block with an oligonucleotide (e.g., anoligonucleotide which includes PCR primer sequences or an initialoligonucleotide) or with a linker to which such an oligonucleotide isattached. In the embodiment set forth in FIG. 5, the linker comprises areactive group for attachment of a first building block and is attachedto an initial oligonucleotide. In this embodiment, reaction of abuilding block, or in each of multiple aliquots, one of a collection ofbuilding blocks, with the reactive group of the linker and addition ofan oligonucleotide encoding the building block to the initialoligonucleotide produces the one or more initial initiator compounds ofthe process set forth above.

In a preferred embodiment, each individual building block is associatedwith a distinct oligonucleotide, such that the sequence of nucleotidesin the oligonucleotide added in a given cycle identifies the buildingblock added in the same cycle.

The coupling of building blocks and ligation of oligonucleotides willgenerally occur at similar concentrations of starting materials andreagents. For example, concentrations of reactants on the order ofmicromolar to millimolar, for example from about 10 μM to about 10 mM,are preferred in order to have efficient coupling of building blocks.

In certain embodiments, the method further comprises, following step(2), the step of scavenging any unreacted initial functional moiety.Scavenging any unreacted initial functional moiety in a particular cycleprevents the initial functional moiety of the cycle from reacting with abuilding block added in a later cycle. Such reactions could lead to thegeneration of functional moieties missing one or more building blocks,potentially leading to a range of functional moiety structures whichcorrespond to a particular oligonucleotide sequence. Such scavenging canbe accomplished by reacting any remaining initial functional moiety witha compound which reacts with the reactive group of step (2). Preferably,the scavenger compound reacts rapidly with the reactive group of step(2) and includes no additional reactive groups that can react withbuilding blocks added in later cycles. For example, in the synthesis ofa compound where the reactive group of step (2) is an amino group, asuitable scavenger compound is an N-hydroxysuccinimide ester, such asacetic acid N-hydroxysuccinimide ester.

In another embodiment, the invention provides a method of producing alibrary of compounds, wherein each compound comprises a functionalmoiety comprising two or more building block residues which isoperatively linked to an oligonucleotide. In a preferred embodiment, theoligonucleotide present in each molecule provides sufficient informationto identify the building blocks within the molecule and, optionally, theorder of addition of the building blocks. In this embodiment, the methodof the invention comprises a method of synthesizing a library ofcompounds, wherein the compounds comprise a functional moiety comprisingtwo or more building blocks which is operatively linked to anoligonucleotide which identifies the structure of the functional moiety.The method comprises the steps of (1) providing a solution comprising minitiator compounds, wherein m is an integer of 1 or greater, where theinitiator compounds consist of a functional moiety comprising n buildingblocks, where n is an integer of 1 or greater, which is operativelylinked to an initial oligonucleotide which identifies the n buildingblocks; (2) dividing the solution of step (1) into at least r fractions,wherein r is an integer of 2 or greater; (3) reacting each fraction withone of r building blocks, thereby producing r fractions comprisingcompounds consisting of a functional moiety comprising n+1 buildingblocks operatively linked to the initial oligonucleotide; (4) reactingeach of the r fractions of step (3) with one of a set of r distinctincoming oligonucleotides under conditions suitable for enzymaticligation of the incoming oligonucleotide to the initial oligonucleotide,thereby producing r fractions comprising molecules consisting of afunctional moiety comprising n+1 building blocks operatively linked toan elongated oligonucleotide which encodes the n+1 building blocks.Optionally, the method can further include the step of (5) recombiningthe r fractions, produced in step (4), thereby producing a solutioncomprising molecules consisting of a functional moiety comprising n+1building blocks, which is operatively linked to an elongatedoligonucleotide which encodes the n+1 building blocks. Steps (1) to (5)can be conducted one or more times to yield cycles 1 to i, where i is aninteger of 2 or greater. In cycle s+1, where s is an integer of i−1 orless, the solution comprising m initiator compounds of step (1) is thesolution of step (5) of cycle s. Likewise, the initiator compounds ofstep (1) of cycle s+1 are the products of step (4) in cycle s.

Preferably the solution of step (2) is divided into r fractions in eachcycle of the library synthesis. In this embodiment, each fract is reatedwith a unique building block.

In the methods of the invention, the order of addition of the buildingblock and the incoming oligonucleotide is not critical, and steps (2)and (3) of the synthesis of a molecule, and steps (3) and (4) in thelibrary synthesis can be reversed, i.e., the incoming oligonucleotidecan be ligated to the initial oligonucleotide before the new buildingblock is added. In certain embodiments, it may be possible to conductthese two steps simultaneously.

In certain embodiments, the method further comprises, following step(2), the step of scavenging any unreacted initial functional moiety.Scavenging any unreacted initial functional moiety in a particular cycleprevents the initial functional moiety of a the cycle from reacting witha building block added in a later cycle. Such reactions could lead tothe generation of functional moieties missing one or more buildingblocks, potentially leading to a range of functional moiety structureswhich correspond to a particular oligonucleotide sequence. Suchscavenging can be accomplished by reacting any remaining initialfunctional moiety with a compound which reacts with the reactive groupof step (2). Preferably, the scavenger compound reacts rapidly with thereactive group of step (2) and includes no additional reactive groupsthat can react with building blocks added in later cycles. For example,in the synthesis of a compound where the reactive group of step (2) isan amino group, a suitable scavenger compound is an N-hydroxysuccinimideester, such as acetic acid N-hydroxysuccinimide ester.

In one embodiment, the building blocks used in the library synthesis areselected from a set of candidate building blocks by evaluating theability of the candidate building blocks to react with appropriatecomplementary functional groups under the conditions used for synthesisof the library. Building blocks which are shown to be suitably reactiveunder such conditions can then be selected for incorporation into thelibrary. The products of a given cycle can, optionally, be purified.When the cycle is an intermediate cycle, i.e., any cycle prior to thefinal cycle, these products are intermediates and can be purified priorto initiation of the next cycle. If the cycle is the final cycle, theproducts of the cycle are the final products, and can be purified priorto any use of the compounds. This purification step can, for example,remove unreacted or excess reactants and the enzyme employed foroligonucleotide ligation. Any methods which are suitable for separatingthe products from other species present in solution can be used,including liquid chromatography, such as high performance liquidchromatography (HPLC) and precipitation with a suitable co-solvent, suchas ethanol. Suitable methods for purification will depend upon thenature of the products and the solvent system used for synthesis.

The reactions are, preferably, conducted in aqueous solution, such as abuffered aqueous solution, but can also be conducted in mixedaqueous/organic media consistent with the solubility properties of thebuilding blocks, the oligonucleotides, the intermediates and finalproducts and the enzyme used to catalyze the oligonucleotide ligation.

It is to be understood that the theoretical number of compounds producedby a given cycle in the method described above is the product of thenumber of different initiator compounds, m, used in the cycle and thenumber of distinct building blocks added in the cycle, r. The actualnumber of distinct compounds produced in the cycle can be as high as theproduct of r and m (r×m), but could be lower, given differences inreactivity of certain building blocks with certain other buildingblocks. For example, the kinetics of addition of a particular buildingblock to a particular initiator compound may be such that on the timescale of the synthetic cycle, little to none of the product of thatreaction may be produced.

In certain embodiments, a common building block is added prior to cycle1, following the last cycle or in between any two cycles. For example,when the functional moiety is a polyamide, a common N-terminal cappingbuilding block can be added after the final cycle. A common buildingblock can also be introduced between any two cycles, for example, to adda functional group, such as an alkyne or azide group, which can beutilized to modify the functional moieties, for example by cyclization,following library synthesis.

The term “operatively linked”, as used herein, means that two chemicalstructures are linked together in such a way as to remain linked throughthe various manipulations they are expected to undergo. Typically thefunctional moiety and the encoding oligonucleotide are linked covalentlyvia- an appropriate linking group. The linking group is a bivalentmoiety with a site of attachment for the oligonucleotide and a site ofattachment for the functional moiety. For example, when the functionalmoiety is a polyamide compound, the polyamide compound can be attachedto the linking group at its N-terminus, its C-terminus or via afunctional group on one of the side chains. The linking group issufficient to separate the polyamide compound and the oligonucleotide byat least one atom, and preferably, by more than one atom, such as atleast two, at least three, at least four, at least five or at least sixatoms. Preferably, the linking group is sufficiently flexible to allowthe polyamide compound to bind target molecules in a manner which isindependent of the oligonucleotide.

In one embodiment, the linking group is attached to the N-terminus ofthe polyamide compound and the 5′-phosphate group of theoligonucleotide. For example, the linking group can be derived from alinking group precursor comprising an activated carboxyl group on oneend and an activated ester on the other end. Reaction of the linkinggroup precursor with the N-terminal nitrogen atom will form an amidebond connecting the linking group to the polyamide compound orN-terminal building block, while reaction of the linking group precursorwith the 5′-hydroxy group of the oligonucleotide will result inattachment of the oligonucleotide to the linking group via an esterlinkage. The linking group can comprise, for example, a polymethylenechain, such as a —(CH₂)_(n)— chain or a poly(ethylene glycol) chain,such as a -(CH₂CH₂0)n chain, where in both cases n is an integer from 1to about 20. Preferably, n is from 2 to about 12, more preferably fromabout 4 to about 10. In one embodiment, the linking group comprises ahexamethylene (—(CH₂)₆—) group.

When the building blocks are amino acid residues, the resultingfunctional moiety is a polyamide. The amino acids can be coupled usingany suitable chemistry for the formation of amide bonds. Preferably, thecoupling of the amino acid building blocks is conducted under conditionswhich are compatible with enzymatic ligation of oligonucleotides, forexample, at neutral or near-neutral pH and in aqueous solution. In oneembodiment, the polyamide compound is synthesized from the C-terminal toN-terminal direction. In this embodiment, the first, or C-terminal,building block is coupled at its carboxyl group to an oligonucleotidevia a suitable linking group. The first building block is reacted withthe second building block, which preferably has an activated carboxylgroup and a protected amino group. Any activating/protecting groupstrategy which is suitable for solution phase amide bond formation canbe used. For example, suitable activated carboxyl species include acylfluorides (U.S. Pat. No. 5,360,928, incorporated herein by reference inits entirety), symmetrical anhydrides and N-hydroxysuccinimide esters.The acyl groups can also be activated in situ, as is known in the art,by reaction with a suitable activating compound. Suitable activatingcompounds include dicyclohexylcarbodiimide (DCC),diisopropylcarbodiimide (DIC),1-ethoxycarbonyl-2-ethoxy-1,2-dihydroquinoline (EEDQ),1-ethyl-3-(3-dimethylaminopropyl)carbodiimide hydrochloride (EDC),n-propane-phosphonic anhydride (PPA),N,N-bis(2-oxo-3-oxazolidinyl)imido-phosphoryl chloride (BOP-C1),bromo-tris-pyrrolidinophosphonium hexafluorophosphate (PyBrop),diphenylphosphoryl azide (DPPA), Castro's reagent (BOP, PyBop),O-benzotriazolyl-N,N,N′,N′-tetramethyluronium salts (HBTU),diethylphosphoryl cyanide (DEPCN),2,5-diphenyl-2,3-dihydro-3-oxo-4-hydroxy-thiophene dioxide (Steglich'sreagent; HOTDO), 1,1′-carbonyl-diimidazole (CDI), and4-(4,6-dimethoxy-1,3,5-triazin-2-yl)-4-methylmorpholinium chloride(DMT-MM). The coupling reagents can be employed alone or in combinationwith additives such as N. N-dimethyl-4-aminopyridine (DMAP),N-hydroxy-benzotriazole (HOBt), N-hydroxybenzotriazine (HOOBt),N-hydroxysuccinimide (HOSu) N-hydroxyazabenzotriazole (HOAt),azabenzotriazolyl-tetramethyluronium salts (HATU, HAPyU) or2-hydroxypyridine. In certain embodiments, synthesis of a libraryrequires the use of two or more activation strategies, to enable the useof a structurally diverse set of building blocks. For each buildingblock, one skilled in the art can determine the appropriate activationstrategy.

The N-terminal protecting group can be any protecting group which iscompatible with the conditions of the process, for example, protectinggroups which are suitable for solution phase synthesis conditions. Apreferred protecting group is the fluorenylmethoxycarbonyl (“Fmoc”)group. Any potentially reactive functional groups on the side chain ofthe aminoacyl building block may also need to be suitably protected.Preferably the side chain protecting group is orthogonal to theN-terminal protecting group, that is, the side chain protecting group isremoved under conditions which are different than those required forremoval of the N-terminal protecting group. Suitable side chainprotecting groups include the nitroveratryl group, which can be used toprotect both side chain carboxyl groups and side chain amino groups.Another suitable side chain amine protecting group is the N-pent-4-enoylgroup.

The building blocks can be modified following incorporation into thefunctional moiety, for example, by a suitable reaction involving afunctional group on one or more of the building blocks. Building blockmodification can take place following addition of the final buildingblock or at any intermediate point in the synthesis of the functionalmoiety, for example, after any cycle of the synthetic process. When alibrary of bifunctional molecules of the invention is synthesized,building block modification can be carried out on the entire library oron a portion of the library, thereby increasing the degree of complexityof the library. Suitable building block modifying reactions includethose reactions that can be performed under conditions compatible withthe functional moiety and the encoding oligonucleotide. Examples of suchreactions include acylation and sulfonation of amino groups or hydroxylgroups, alkylation of amino groups, esterification or thioesterificationof carboxyl groups, amidation of carboxyl groups, epoxidation ofalkenes, and other reactions as are known the art. When the functionalmoiety includes a building block having an alkyne or an azide functionalgroup, the azide/alkyne cycloaddition reaction can be used to derivatizethe building block. For example, a building block including an alkynecan be reacted with an organic azide, or a building block including anazide can be reacted with an alkyne, in either case forming a triazole.Building block modification reactions can take place after addition ofthe final building block or at an intermediate point in the syntheticprocess, and can be used to append a variety of chemical structures tothe functional moiety, including carbohydrates, metal binding moietiesand structures for targeting certain biomolecules or tissue types.

In another embodiment, the functional moiety comprises a linear seriesof building blocks and this linear series is cyclized using a suitablereaction. For example, if at least two building blocks in the lineararray include sulfhydryl groups, the sulfhydryl groups can be oxidizedto form a disulfide linkage, thereby cyclizing the linear array. Forexample, the functional moieties can be oligopeptides which include twoor more L or D-cysteine and/or L or D-homocysteine moieties. Thebuilding blocks can also include other functional groups capable ofreacting together to cyclize the linear array, such as carboxyl groupsand amino or hydroxyl groups.

In a preferred embodiment, one of the building blocks in the lineararray comprises an alkyne group and another building block in the lineararray comprises an azide group. The azide and alkyne groups can beinduced to react via cycloaddition, resulting in the formation of amacrocyclic structure. In the example illustrated in FIG. 9, thefunctional moiety is a polypeptide comprising a propargylglycinebuilding block at its C-terminus and an azidoacetyl group at itsN-terminus. Reaction of the alkyne and the azide group under suitableconditions results in formation of a cyclic compound, which includes atriazole structure within the macrocycle. In the case of a library, inone embodiment, each member of the library comprises alkyne-andazide-containing building blocks and can be cyclized in this way. In asecond embodiment, all members of the library comprises alkyne- andazide-containing building blocks, but only a portion of the library iscyclized. In a third embodiment, only certain functional moietiesinclude alkyne- and azide-containing building blocks, and only thesemolecules are cyclized. In the forgoing second and third embodiments,the library, following the cycloaddition reaction, will include bothcyclic and linear functional moieties.

In some embodiments of the invention in which the same functionalmoiety, e.g., triazine, is added to each and all of the fractions of thelibrary during a particular synthesis step, it may not be necessary toadd an oligonucleotide tag encoding that function moiety.

Oligonucleotides may be ligated by chemical or enzymatic methods. In oneembodiment, oligonucleotides are ligated by chemical means. Chemicalligation of DNA and RNA may be performed using reagents such as watersoluble carbodiimide and cyanogen bromide as taught by, for example,Shabarova, et al. (1991) Nucleic Acids Research, 19, 4247-4251),Federova, et al. (1996) Nucleosides and Nucleotides, 15, 1137-1147, andCarriero and Damha (2003) Journal of Organic Chemistry, 68, 8328-8338.In one embodiment, chemical ligation is performed using cyanogenbromide, 5 M in acetonitrile, in a 1:10 v/v ratio with 5′ phosphorylatedoligonucleotide in a pH 7.6 buffer (1 M MES+20 mM MgCl₂) at 0 degreesfor 1-5 minutes. The oligonucleotides may be double stranded, preferablywith an overhang of about 5 to about 14 bases. The oligonucleotide mayalso be single stranded, in which case a splint with an overlap of about6 bases with each of the oligonucleotides to be ligated is employed toposition the reactive 5′ and 3′ moieties in proximity with each other.

In another embodiment, the oligonucleotides are ligated using enzymaticmethods. In one embodiment, the initial building block is operativelylinked to an initial oligonucleotide. Prior to or following coupling ofa second building block to the initial building block, a secondoligonucleotide sequence which identifies the second building block isligated to the initial oligonucleotide. Methods for ligating the initialoligonucleotide sequence and the incoming oligonucleotide sequence areset forth in FIGS. 1 and 2. In FIG. 1, the initial oligonucleotide isdouble-stranded, and one strand includes an overhang sequence which iscomplementary to one end of the second oligonucleotide and brings thesecond oligonucleotide into contact with the initial oligonucleotide.Preferably the overhanging sequence of the initial oligonucleotide andthe complementary sequence of the second oligonucleotide are both atleast about 4 bases; more preferably both sequences are both the samelength. The initial oligonucleotide and the second oligonucleotide canbe ligated using a suitable enzyme. If the initial oligonucleotide islinked to the first building block at the 5′ end of one of the strands(the “top strand”), then the strand which is complementary to the topstrand (the “bottom strand”) will include the overhang sequence at its5′ end, and the second oligonucleotide will include a complementarysequence at its 5′ end. Following ligation of the secondoligonucleotide, a strand can be added which is complementary to thesequence of the second oligonucleotide which is 3′ to the overhangcomplementary sequence, and which includes additional overhang sequence.

In one embodiment, the oligonucleotide is elongated as set forth in FIG.2. The oligonucleotide bound to the growing functional moiety and theincoming oligonucleotide are positioned for ligation by the use of a“splint” sequence, which includes a region which is complementary to the3′ end of the initial oligonucleotide and a region which iscomplementary to the 5′ end of the incoming oligonucleotide. The splintbrings the 5′ end of the oligonucleotide into proximity with the 3′ endof the incoming oligo and ligation is accomplished using enzymaticligation. In the example illustrated in FIG. 2, the initialoligonucleotide consists of 16 nucleobases and the splint iscomplementary to the 6 bases at the 3′ end. The incoming oligonucleotideconsists of 12 nucleobases, and the splint is complementary to the 6bases at the 5′ terminus. The length of the splint and the lengths ofthe complementary regions are not critical. However, the complementaryregions should be sufficiently long to enable stable dimer formationunder the conditions of the ligation, but not so long as to yield anexcessively large encoding nucleotide in the final molecules. It ispreferred that the complementary regions are from about 4 bases to about12 bases, more preferably from about 5 bases to about 10 bases, and mostpreferably from about 5 bases to about 8 bases in length.

The split-and-pool methods used for the methods for library synthesisset forth herein assure that each unique functional moiety isoperatively linked to at least one unique oligonucleotide sequence whichidentifies the functional moiety. If 2 or more different oligonucleotidetags are used for at least one building bock in at least one of thesynthetic cycles, each distinct functional moiety comprising thatbuilding block will be encoded by multiple oligonucleotides. Forexample, if 2 oligonucleotide tags are used for each building blockduring the synthesis of a 4 cycle library, there will be 16 DNAsequences (2⁴) that encode each unique functional moiety. There areseveral potential advantages for encoding each unique functional moietywith multiple sequences. First, selection of a different combination oftag sequences encoding the same functional moiety assures that thosemolecules were independently selected. Second, selection of a differentcombination of tag sequences encoding the same functional moietyeliminates the possibility that the selection was based on the sequenceof the oligonucleotide. Third, technical artifact can be recognized ifsequence analysis suggests that a particular functional moiety is highlyenriched, but only one sequence combination out of many possibilitiesappears. Multiple tagging can be accomplished by having independentsplit reactions with the same building block but a differentoligonucleotide tag. Alternatively, multiple tagging can be accomplishedby mixing an appropriate ratio of each tag in a single tagging reactionwith an individual building block.

In one embodiment, the initial oligonucleotide is double-stranded andthe two strands are covalently joined. One means of covalently joiningthe two strands is shown in FIG. 3, in which a linking moiety is used tolink the two strands and the functional moiety. The linking moiety canbe any chemical structure which comprises a first functional group whichis adapted to react with a building block, a second functional groupwhich is adapted to react with the 3′-end of an oligonucleotide, and athird functional group which is adapted to react with the 5′-end of anoligonucleotide. Preferably, the second and third functional groups areoriented so as to position the two oligonucleotide strands in a relativeorientation that permits hybridization of the two strands. For example,the linking moiety can have the general structure (I):

where A, is a functional group that can form a covalent bond with abuilding block, B is a functional group that can form a bond with the5′-end of an oligonucleotide, and C is a functional group that can forma bond with the 3′-end of an oligonucleotide. D, F and E are chemicalgroups that link functional groups A, C and B to S, which is a core atomor scaffold. Preferably, D, E and F are each independently a chain ofatoms, such as an alkylene chain or an oligo(ethylene glycol) chain, andD, E and F can be the same or different, and are preferably effective toallow hybridization of the two oligonucleotides and synthesis of thefunctional moiety. In one embodiment, the trivalent linker has thestructure

In this embodiment, the NH group is available for attachment to abuilding block, while the terminal phosphate groups are available forattachment to an oligonucleotide.

In embodiments in which the initial oligonucleotide is double-stranded,the incoming oligonucleotides are also double-stranded. As shown in FIG.3, the initial oligonucleotide can have one strand which is longer thanthe other, providing an overhang sequence. In this embodiment, theincoming oligonucleotide includes an overhang sequence which iscomplementary to the overhang sequence of the initial oligonucleotide.Hybridization of the two complementary overhang sequences brings theincoming oligonucleotide into position for ligation to the initialoligonucleotide. This ligation can be performed enzymatically using aDNA or RNA ligase. The overhang sequences of the incomingoligonucleotide and the initial oligonucleotide are preferably the samelength and consist of two or more nucleotides, preferably from 2 toabout 10 nucleotides, more preferably from 2 to about 6 nucleotides. Inone preferred embodiment, the incoming oligonucleotide is adouble-stranded oligonucleotide having an overhang sequence at each end.The overhang sequence at one end is complementary to the overhangsequence of the initial oligonucleotide, while, after ligation of theincoming oligonucleotide and the initial oligonucleotide, the overhangsequence at the other end becomes the overhang sequence of initialoligonucleotide of the next cycle. In one embodiment, the three overhangsequences are all 2 to 6 nucleotides in length, and the encodingsequence of the incoming oligonucleotide is from 3 to 10 nucleotides inlength, preferably 3 to 6 nucleotides in length. In a particularembodiment, the overhang sequences are all 2 nucleotides in length andthe encoding sequence is 5 nucleotides in length.

In the embodiment illustrated in FIG. 4, the incoming strand has aregion at its 3′ end which is complementary to the 3′ end of the initialoligonucleotide, leaving overhangs at the 5′ ends of both strands. The5′ ends can be filled in using, for example, a DNA polymerase, such asvent polymerase, resulting in a double-stranded elongatedoligonucleotide. The bottom strand of this oligonucleotide can beremoved, and additional sequence added to the 3′ end of the top strandusing the same method.

The encoding oligonucleotide tag is formed as the result of thesuccessive addition of oligonucleotides that identify each successivebuilding block. In one embodiment of the methods of the invention, thesuccessive oligonucleotide tags may be coupled by enzymatic ligation toproduce an encoding oligonucleotide.

Enzyme-catalyzed ligation of oligonucleotides can be performed using anyenzyme that has the ability to ligate nucleic acid fragments. Exemplaryenzymes include ligases, polymerases, and topoisomerases. In specificembodiments of the invention, DNA ligase (EC 6.5.1.1), DNA polymerase(EC 2.7.7.7), RNA polymerase (EC 2.7.7.6) or topoisomerase (EC 5.99.1.2)are used to ligate the oligonucleotides. Enzymes contained in each ECclass can be found, for example, as described in Bairoch (2000) NucleicAcids Research 28:304-5.

In a preferred embodiment, the oligonucleotides used in the methods ofthe invention are oligodeoxynucleotides and the enzyme used to catalyzethe oligonucleotide ligation is DNA ligase. In order for ligation tooccur in the presence of the ligase, i.e., for a phosphodiester bond tobe formed between two oligonucleotides, one oligonucleotide must have afree 5′ phosphate group and the other oligonucleotide must have a free3′ hydroxyl group. Exemplary DNA ligases that may be used in the methodsof the invention include T4 DNA ligase, Taq DNA ligase, T₄ RNA ligase,DNA ligase (E. coli) (all available from, for example, New EnglandBiolabs, MA).

One of skill in the art will understand that each enzyme used forligation has optimal activity under specific conditions, e.g.,temperature, buffer concentration, pH and time. Each of these conditionscan be adjusted, for example, according to the manufacturer'sinstructions, to obtain optimal ligation of the oligonucleotide tags.

The incoming oligonucleotide can be of any desirable length, but ispreferably at least three nucleobases in length. More preferably, theincoming oligonucleotide is 4 or more nucleobases in length. In oneembodiment, the incoming oligonucleotide is from 3 to about 12nucleobases in length. It is preferred that the oligonucleotides of themolecules in the libraries of the invention have a common terminalsequence which can serve as a primer for PCR, as is known in the art.Such a common terminal sequence can be incorporated as the terminal endof the incoming oligonucleotide added in the final cycle of the librarysynthesis, or it can be added following library synthesis, for example,using the enzymatic ligation methods disclosed herein.

A preferred embodiment of the method of the invention is set forth inFIG. 5. The process begins with a synthesized DNA sequence which isattached at its 5′ end to a linker which terminates in an amino group.In step 1, this starting DNA sequence is ligated to an incoming DNAsequence in the presence of a splint DNA strand, DNA ligase anddithiothreitol in Tris buffer. This yields a tagged DNA sequence whichcan then be used directly in the next step or purified, for example,using HPLC or ethanol precipitation, before proceeding to the next step.In step 2 the tagged DNA is reacted with a protected activated aminoacid, in this example, an Fmoc-protected amino acid fluoride, yielding aprotected amino acid-DNA conjugate. In step 3, the protected aminoacid-DNA conjugate is deprotected, for example, in the presence ofpiperidine, and the resulting deprotected conjugate is, optionally,purified, for example, by HPLC or ethanol precipitation. The deprotectedconjugate is the product of the first synthesis cycle, and becomes thestarting material for the second cycle, which adds a second amino acidresidue to the free amino group of the deprotected conjugate.

In embodiments in which PCR is to be used to amplify and/or sequence theencoding oligonucleotides of selected molecules, the encodingoligonucleotides may include, for example, PCR primer sequences and/orsequencing primers (e.g., primers such as, for example,3′-GACTACCGCGCTCCCTCCG-5′ and 3′-GACTCGCCCGACCGTTCCG-5′). A PCR primersequence can be included, for example, in the initial oligonucleotideprior to the first cycle of synthesis, and/or it can be included withthe first incoming oligonucleotide, and/or it can be ligated to theencoding oligonucleotide following the final cycle of library synthesis,and/or it can be included in the incoming oligonucleotide of the finalcycle. The PCR primer sequences added following the final cycle oflibrary synthesis and/or in the incoming oligonucleotide of the finalcycle are referred to herein as “capping sequences”.

In one embodiment, the PCR primer sequence is designed into the encodingoligonucleotide tag. For example, a PCR primer sequence may beincorporated into the initial oligonucleotide tag and/or it may beincorporated into the final oligonucleotide tag. In one embodiment thesame PCR primer sequence is incorporated into the initial and finaloligonucleotide tag. In another embodiment, a first PCR sequence isincorporated into the initial oligonucleotide tag and a second PCRprimer sequence is incorporated in the final oligonucleotide tag.Alternatively, the second PCR primer sequence may be incorporated intothe capping sequence as described herein. In preferred embodiments, thePCR primer sequence is at least about 5, 7, 10, 13, 15, 17, 20, 22, or25 nucleotides in length.

PCR primer sequences suitable for use in the libraries of the inventionare known in the art; suitable primers and methods are set forth, forexample, in Innis, et al., eds., PCR Protocols: A Guide to Methods andApplications, San Diego: Academic Press (1990), the contents of whichare incorporated herein by reference in their entirety. Other suitableprimers for use in the construction of the libraries described hereinare those primers described in PCT Publications WO 2004/069849 and WO2005/003375, the contents of which are expressly incorporated herein byreference.

The term “polynucleotide” as used herein in reference to primers, probesand nucleic acid fragments or segments to be synthesized by primerextension is defined as a molecule comprised of two or moredeoxyribonucleotides, preferably more than three.

The term “primer” as used herein refers to a polynucleotide whetherpurified from a nucleic acid restriction digest or producedsynthetically, which is capable of acting as a point of initiation ofnucleic acid synthesis when placed under conditions in which synthesisof a primer extension product which is complementary to a nucleic acidstrand is induced, i.e., in the presence of nucleotides and an agent forpolymerization such as DNA polymerase, reverse transcriptase and thelike, and at a suitable temperature and pH. The primer is preferablysingle stranded for maximum efficiency, but may alternatively be indouble stranded form. If double stranded, the primer is first treated toseparate it from its complementary strand before being used to prepareextension products. Preferably, the primer is a polydeoxyribonucleotide.The primer must be sufficiently long to prime the synthesis of extensionproducts in the presence of the agents for polymerization. The exactlengths of the primers will depend on many factors, includingtemperature and the source of primer.

The primers used herein are selected to be “substantially” complementaryto the different strands of each specific sequence to be amplified. Thismeans that the primer must be sufficiently complementary so as tonon-randomly hybridize with its respective template strand. Therefore,the primer sequence may or may not reflect the exact sequence of thetemplate.

The polynucleotide primers can be prepared using any suitable method,such as, for example, the phosphotriester or phosphodiester methodsdescribed in Narang et al., (1979) Meth. Enzymol., 68:90; U.S. Pat. No.4,356,270, U.S. Pat. No. 4,458,066, U.S. Pat. No. 4,416,988, U.S. Pat.No. 4,293,652; and Brown et al., (1979) Meth. Enzymol., 68:109. Thecontents of all the foregoing documents are incorporated herein byreference.

In cases in which the PCR primer sequences are included in an incomingoligonucleotide, these incoming oligonucleotides will preferably besignificantly longer than the incoming oligonucleotides added in theother cycles, because they will include both an encoding sequence and aPCR primer sequence.

In one embodiment, the capping sequence is added after the addition ofthe final building block and final incoming oligonucleotide, and thesynthesis of a library as set forth herein includes the step of ligatingthe capping sequence to the encoding oligonucleotide, such that theoligonucleotide portion of substantially all of the library membersterminates in a sequence that includes a PCR primer sequence.Preferably, the capping sequence is added by ligation to the pooledfractions which are products of the final synthetic cycle. The cappingsequence can be added using the enzymatic process used in theconstruction of the library.

In one embodiment, the same capping sequence is ligated to every memberof the library. In another embodiment, a plurality of capping sequencesare used. In this embodiment, oligonucleotide capping sequencescontaining variable bases are, for example, ligated onto library membersfollowing the final synthetic cycle. In one embodiment, following thefinal synthetic cycle, the fractions are pooled and then split intofractions again, with each fraction having a different capping sequenceadded. Alternatively, multiple capping sequences can be added to thepooled library following the final synthesis cycle. In both embodiments,the final library members will include molecules comprising specificfunctional moieties linked to identifying oligonucleotides including twoor more different capping sequences.

In one embodiment, the capping primer comprises an oligonucleotidesequence containing variable, i.e., degenerate, nucleotides. Suchdegenerate bases within the capping primers permit the identification oflibrary molecules of interest by determining whether a combination ofbuilding blocks is the consequence of PCR duplication (identicalsequence) or independent occurrences of the molecule (differentsequence). For example, such degenerate bases may reduce the potentialnumber of false positives identified during the biological screening ofthe encoded library.

In one embodiment, a degenerate capping primer comprises or has thefollowing sequence:

where N can be any of the 4 bases, permitting 1024 different sequences(4⁵). The primer has the following sequence after its ligation onto thelibrary and primer-extension: 5′-CAGCGTTCGAN′N′N′N′N′CAGACAAGCTTCACCTGC-3′ 3′-AA GTCGCAAGCT N N N N NGTCTGTTCGAAGTGGACG-5′

In another embodiment, the capping primer comprises or has the followingsequence:

where B can be any of C, G or T, permitting 19,683 different sequences(3⁹). The design of the degenerate region in this primer improves DNAsequence analysis, as the A bases that flank and punctuate thedegenerate B bases prevent homopolymeric stretches of greater than 3bases, and facilitate sequence alignment.

In one embodiment, the degenerate capping oligonucleotide is ligated tothe members of the library using a suitable enzyme and the upper strandof the degenerate capping oligonucleotide is subsequently polymerizedusing a suitable enzyme, such as a DNA polymerase.

In another embodiment, the PCR priming sequence is a “universal adaptor”or “universal primer”. As used herein, a “universal adaptor” or“universal primer” is an oligonucleotide that contains a unique PCRpriming region, that is, for example, about 5, 7, 10, 13, 15, 17, 20,22, or 25 nucleotides in length, and is located adjacent to a uniquesequencing priming region that is, for example, about 5, 7, 10, 13, 15,17, 20, 22, or 25 nucleotides in length, and is optionally followed by aunique discriminating key sequence (or sample identifier sequence)consisting of at least one of each of the four deoxyribonucleotides(i.e., A, C, G, T).

As used herein, the term “discriminating key sequence′ or “sampleidentifier sequence” refers to a sequence that may be used to uniquelytag a population of molecules from a sample. Multiple samples, eachcontaining a unique sample identifier sequence , can be mixed, sequencedand re-sorted after DNA sequencing for analysis of individual samples.The same discriminating sequence can be used for an entire library or,alternatively, different discriminating key sequences can be used totrack different libraries. In one embodiment, the discriminating keysequence is on either the 5′ PCR primer, the 3′ PCR primer, or on bothprimers. If both PCR primers contain a sample identifier sequence, thenumber of different samples that can be pooled with unique sampleidentifier sequences is the product of the number of sample identifiersequences on each primer. Thus, 10 different 5′ sample identifiersequence primers can be combined with 10 different 3′ sample identifiersequence primers to yield 100 different sample identifier sequencecombinations.

Non-limiting examples of 5′ and 3′ unique PCR primers containingdiscriminating key sequences include the following: 5′ primers (variablepositions bold and italicized): 5′ A - GCCTTGCCAGCCCGCTCAG

TGACTCCCAAATCGATGTG; 5′ C - GCCTTGCCAGCCCGCTCAG

TGACTCCCAAATCGATGTG; 5′ G - GCCTTGCCAGCCCGCTCAG

TGACTCCCAAATCGATGTG; 5′ T - GCCTTGCCAGCCCGCTCAG

TGACTCCCAAATCGATGTG; 5′ AA - GCCTTGCCAGCCCGCTCAG

TGACTCCCAAATCGATGTG; 5′ AC - GCCTTGCCAGCCCGCTCAG

TGACTCCCAAATCGATGTG; 5′ AG - GCCTTGCCAGCCCGCTCAG

TGACTCCCAAATCGATGTG; 5′ AT - GCCTTGCCAGCCCGCTCAG

TGACTCCCAAATCGATGTG; and 5′ CA - GCCTTGCCAGCCCGCTCAG

TGACTCCCAAATCGATGTG. 3′ SID primers (variable positions bold anditalicized): 3′ A - GCCTCCCTCGCGCCATCAG

GCAGGTGAAGCTTGTCTG; 3′ C - GCCTCCCTCGCGCCATCAG

GCAGGTGAAGCTTGTCTG; 3′ G - GCCTCCCTCGCGCCATCAG

GCAGGTGAAGCTTGTCTG; 3′ T - GCCTCCCTCGCGCCATCAG

GCAGGTGAAGCTTGTCTG; 3′ AA - GCCTCCCTCGCGCCATCAG

GCAGGTGAAGCTTGTCTG; 3′ AC - GCCTCCCTCGCGCCATCAG

GCAGGTGAAGCTTGTCTG; 3′ AG - GCCTCCCTCGCGCCATCAG

GCAGGTGAAGCTTGTCTG; 3′ AT - GCCTCCCTCGCGCCATCAG

GCAGGTGAAGCTTGTCTG; and 3′ CA - GCCTCCCTCGCGCCATCAG

GCAGGTGAAGCTTGTCTG

In one embodiment, the discriminating key sequence is about 4, 5, 6, 7,8, 9, or 10 nucleotides in length. In another embodiment, thediscriminating key sequence is a combination of about 1-4 nucleotides.In yet another embodiment, each universal adaptor is about forty-fournucleotides in length. In one embodiment the universal adaptors areligated, using T4 DNA ligase, onto the end of the encodingoligonucleotide. Different universal adaptors may be designedspecifically for each library preparation and will, therefore, provide aunique identifier for each library. The size and sequence of theuniversal adaptors may be modified as deemed necessary by one of skillin the art.

In one embodiment, the universal adaptor added as a capping sequence islinked to a support binding moiety. For example, a 5′-biotin is added tothe universal adaptor to allow, for example, isolation ofsingle-stranded DNA template as well as non-covalent coupling of theuniversal adaptor to the surface of a solid support that is saturatedwith a biotin-binding protein (i.e., streptavidin, neutravidin oravidin). Other linkages are well known in the art and may be used inplace of biotin-streptavidin (for example antibody/antigen-epitope,receptor/ligand and oligonucleotide pairing or complimentarily).

In another embodiment, the capping sequence contains anchor primersequences such that the members of the library may be attached to asolid substrate. In one embodiment, the anchor primer sequences areannealed to the capping sequences using recognized techniques (see,e.g., Hatch, et al. (1999) Genet. Anal Biomol Engineer 15: 35-40; U.S.Pat. No. 5,714,320, and U.S. Pat. No. 5,854,033). In general, anyprocedure for annealing the anchor primers to the capping sequences issuitable as long as it results in formation of specific, i.e., perfector nearly perfect, complementarity between the adapter region or regionsin the anchor primer sequence and a sequence present in the cappingsequences. The anchoring of the encoding oligonucleotide to the solidsurface may be reversible or irreversible, e.g., the anchor to the solidsurface may be cleavable or non-cleavable.

In one embodiment, the universal primer, is annealed to a solid supportthat contains oligonucleotide capture primers that are complementary tothe PCR priming regions of the universal adaptor ends.

In one embodiment, the solid support is a bead, for example, a sepharosebead. The beads may be of any convenient size and fabricated from anynumber of known materials. Example of such materials include:inorganics, natural polymers, and synthetic polymers. Specific examplesof these materials include: cellulose, cellulose derivatives, acrylicresins, glass; silica gels, polystyrene, gelatin, polyvinyl pyrrolidone,co-polymers of vinyl and acrylamide, polystyrene cross-linked withdivinylbenzene or the like (see, Merrifield (1964) Biochemistry3:1385-1390), polyacrylamides, latex gels, polystyrene, dextran, rubber,silicon, plastics, nitrocellulose, celluloses, natural sponges, silicagels, glass, metals plastic, cellulose, cross-linked dextrans (e.g.,Sephadex=198 ) and agarose gel (Sepharose=198 ) and solid phase supportsknown to those of skill in the art.

The encoding oligonucleotides may be attached to the solid supportcapture bead (“DNA capture bead”) in any manner known in the art. Anysuitable coupling agent known in the art can be used, such as, forexample, water-soluble carbodiimide, to link the 5′-phosphate on the DNAto amine-coated capture beads through a phosphoamidate bond, couplingspecific oligonucleotide linkers to the bead using similar chemistry,and using DNA ligase to link the DNA to the linker on the bead, joiningthe oligonucleotide to the beads using N-hydroxysuccinamide (NHS) andits derivatives, such that one end of the oligonucleotide may contain areactive group (such as an amide group) which forms a covalent bond withthe solid support, while the other end of the linker contains a secondreactive group that can bond with the oligonucleotide to be immobilized.

In another embodiment, the oligonucleotide is bound to the DNA capturebead by non-covalent linkage, such as chelation or antigen-antibodycomplexes, may also be used to join the oligonucleotide to the bead.Oligonucleotide linkers can be employed which specifically hybridize tounique sequences at the end of the DNA fragment, such as the overlappingend from a restriction enzyme site or the “sticky ends” of bacteriophagelambda based cloning vectors, but blunt-end ligations can also be usedbeneficially. These methods are described in detail in U.S. Pat. No.5,674,743. It is preferred that any method used to immobilize the beadswill continue to bind the immobilized oligonucleotide throughout thesteps in the methods of the invention.

In one embodiment, the oligonucleotide is attached to a solid supportmanufactured from, for example, glass, plastic, a nylon membrane, a gelmatrix, ceramics, silica, silicon, or any other non-reactive material asdescribed in U.S. Pat. 6,787,308, the entire contents of which areincorporated by reference. The supports generally comprise a flat, i.e.,planar, surface, or at least an array in which the molecules to beanalysed are in the same plane. The oligonucleotide may be attached byspecific covalent or non-covalent interactions. In one embodiment of theinvention, the surface of a solid support is coated with streptavidin oravidin. In another embodiment of the invention, the solid surface iscoated with an epoxide and the molecules are coupled via an aminelinkage. In yet another embodiment, the encoding oligonucleotide may beattached to a solid support via hybridization to a complementary nucleicacid molecule previously attached to the solid support.

In one embodiment, the solid support is pretreated to create surfacechemistry that facilitates oligonucleotide attachment and subsequentsequence analysis. In one embodiment, the solid support is coated with apolyelectrolyte multilayer (PEM). In another embodiment, the encodingoligonucleotide is attached to the surface of a microfabricated channelor to the surface of reaction chambers that are disposed along amicrofabricated flow channel, optionally with streptavidin-biotin links.The methods of each of these attachment methods are described in PCTPublication No. WO 2005/080605, the entire contents of which areincorporated by reference.

In one embodiment, the encoding oligonucleotide is attached to a solidsurface at high density and at single molecule resolution. In oneembodiment, the encoding oligonucleotide is attached to a solid surfaceat an individually-addressable location (see, e.g., PCT Publication No.WO 2005/080605).

Attachment of the encoding oligonucleotide to any suitable solid surfacecan occur prior to the hybridization of a primer for amplificationand/or sequencing or alternatively, the encoding oligonucleotide can beattached to any suitable solid surface after the hybridization of aprimer for amplification and/or sequencing.

In another embodiment, the oligonucleotide is attached to a particle,such as a microsphere, which is itself attached to a solid support. Themicrospheres may be of any suitable size, typically in the range of from10 nm to 100 nm in diameter.

In one embodiment, the universal adaptors are not 5′-phosphorylated.Accordingly, “gaps” or “nicks” can be filled in by using a DNApolymerase enzyme that can bind to, strand displace and extend thenicked DNA fragments according to techniques recognized in the art. DNApolymerases that lack 3→5′ exonuclease activity but exhibit 5→3′exonuclease activity have the ability to recognize nicks, displace thenicked strands, and extend the strand in a manner that results in therepair of the nicks and in the formation of non-nicked double-strandedDNA (Hamilton, et al. (2001) BioTechniques 31:370).

Several modifying enzymes are utilized for the nick repair step,including but not limited to polymerase, ligase and kinase. DNApolymerases that can be used for this application include, for example,E. coli DNA pol I, Thermoanaerobacter thermohydrosulfuricus pol I, andbacteriophage phi 29. In one embodiment, the strand displacing enzymeBacillus stearothermophilus pol I (Bst DNA polymerase I) is used torepair the nicked dsDNA and results in non-nicked dsDNA. In anotherembodiment, the ligase is T4 and the kinase is polynucleotide kinase.

The invention further relates to the compounds which can be producedusing the methods of the invention, and collections of such compounds,either as isolated species or pooled to form a library of chemicalstructures. Compounds of the invention include compounds of the formula

where X is a functional moiety comprising one or more building blocks, Zis an oligonucleotide attached at its 3′ terminus to B and Y is anoligonucleotide which is attached to C at its 5′ terminus. A is afunctional group that forms a covalent bond with X, B is a functionalgroup that forms a bond with the 3′-end of Z and C is a functional groupthat forms a bond with the 5′-end of Y, D, F and E are chemical groupsthat link functional groups A, C and B to S, which is a core atom orscaffold. Preferably, D, E and F are each independently a chain ofatoms, such as an alkylene chain or an oligo(ethylene glycol) chain, andD, E and F can be the same or different, and are preferably effective toallow hybridization of the two oligonucleotides and synthesis of thefunctional moiety.

Preferably, Y and Z are substantially complementary and are oriented inthe compound so as to enable Watson-Crick base pairing and duplexformation under suitable conditions. Y and Z are the same length ordifferent lengths. Preferably, Y and Z are the same length, or one of Yand Z is from 1 to 10 bases longer than the other. In a preferredembodiment, Y and Z are each 10 or more bases in length and havecomplementary regions of ten or more base pairs. More preferably, Y andZ are substantially complementary throughout their length, i.e., theyhave no more than one mismatch per every ten base pairs. Mostpreferably, Y and Z are complementary throughout their length, i.e.,except for any overhang region on Y or Z, the strands hybridize viaWatson-Crick base pairing with no mismatches throughout their entirelength.

S can be a single atom or a molecular scaffold. For example, S can be acarbon atom, a boron atom, a nitrogen atom or a phosphorus atom, or apolyatomic scaffold, such as a phosphate group or a cyclic group, suchas a cycloalkyl, cycloalkenyl, heterocycloalkyl, heterocycloalkenyl,aryl or heteroaryl group. In one embodiment, the linker is a group ofthe structure

where each of n, m and p is, independently, an integer from 1 to about20, preferably from 2 to eight, and more preferably from 3 to 6. In oneparticular embodiment, the linker has the structure shown below.

In one embodiment, the libraries of the invention include moleculesconsisting of a functional moiety composed of building blocks, whereeach functional moiety is operatively linked to an encodingoligonucleotide. The nucleotide sequence of the encoding oligonucleotideis indicative of the building blocks present in the functional moiety,and in some embodiments, the connectivity or arrangement of the buildingblocks. The invention provides the advantage that the methodology usedto construct the functional moiety and that used to construct theoligonucleotide tag can be performed in the same reaction medium,preferably an aqueous medium, thus simplifying the method of preparingthe library compared to methods in the prior art. In certain embodimentsin which the oligonucleotide ligation steps and the building blockaddition steps can both be conducted in aqueous media, each reactionwill have a different pH optimum. In these embodiments, the buildingblock addition reaction can be conducted at a suitable pH andtemperature in a suitable aqueous buffer. The buffer can then beexchanged for an aqueous buffer which provides a suitable pH foroligonucleotide ligation.

In another embodiment, the invention provides compounds, and librariescomprising such compounds, of Formula IIZ-L A_(t)-X(Y)_(n)   (II)where X is a molecular scaffold, each Y is independently, a peripheralmoiety, and n is an integer from 1 to 6. Each A is independently, abuilding block and n is an integer from 0 to about 5. L is a linkingmoiety and Z is a single- stranded or double-stranded oligonucleotidewhich identifies the structure -A_(t)-X(Y)_(n). The structure X(Y)_(n)can be, for example, one of the scaffold structures set forth in Table 8(see below). In one embodiment, the invention provides compounds, andlibraries comprising such compounds, of Formula III:

where t is an integer from 0 to about 5, preferably from 0 to 3, andeach A is, independently, a building block. L is a linking moiety and Zis a single-stranded or double-stranded oligonucleotide which identifieseach A and R₁, R₂, R₃ and R₄. R₁, R₂, R₃ and R₄ are each independently asubstituent selected from hydrogen, alkyl, substituted alkyl,heteroalkyl, substituted heteroalkyl, cycloalkyl, heterocycloalkyl,substituted cycloalkyl, substituted heterocycloalkyl, aryl, substitutedaryl, arylalkyl, heteroarylalkyl, substituted arylalkyl, substitutedheteroarylalkyl, heteroaryl, substituted heteroaryl, alkoxy, aryloxy,amino, and substituted amino. In one embodiment, each A is an amino acidresidue.

Libraries which include compounds of Formula II or Formula III cancomprise at least about 100; 1000; 10,000; 100,000; 1,000,000 or10,000,000 compounds of Formula II or Formula III. In one embodiment,the library is prepared via a method designed to produce a librarycomprising at least about 100; 1000; 10,000; 100,000; 1,000,000 or10,000,000 compounds of Formula II or Formula III. TABLE 8 ScaffoldsAmine Aldehyde/Ketone

amines benzaldehydes and furfural

R2—CHO

R1—CHO

R2—NH₂

R1—CHO

wide range of primary aliphatic amines

R1—HS R1—NH₂

R1—NH₂

amino acid

amino acid

R2—CHO

R1—NH₂

Amines Aldehydes

Carboxylic acid Other Reference

Carranco, I., et al. (2005) J. Comb. Chem. 7:33-41

Rosamilia, A. E., et al. (2005) Organic Letters 7:1525-1528

Syeda Huma, H. Z., et al. (2002) Tet Lett 43:6485- 6488

≡N—R3 Tempest, P., et al. (2001) Tet Lett 42:4959-4962

Paulvannan, K. (1999) Tet Lett 40:1851- 1854

≡N—R4 Tempest, P., et al. (2001) Tet Lett 42:4963-4968

≡N—R3 Tempest, P., et al. (2003) Tet Lett 44:1947-1950 R2—COOH

Nefzi, A., et al. (1999) Tet Lett 40:4939- 4942

Bose, A.K., et al. (2005) Tet Lett 46:1901- 1903

Stadler, A. and Kappe, C. O. (2001) J. Comb. Chem. 3:624-630; Lengar, A.and Kappe, C. O. (2004) Organic Letters 6:771- 774

Ivachtchenko, A. V., et al. (2003) J. Comb. Chem. 5:775-788

Micheli, F., et al. (2001) J. Comb. Chem. 3:224- 228

Sternson, S. M., et al. (2001) Org. Lett. 3:4239- 4242

Cheng, W. -C., et al. (2002) J. Org. Chem. 67:5673- 5677; Park, K. -H.,et al. (2001) J Comb Chem 3:171-176

Brown, B. J., et al. (2000) Synlett 1:131- 133

Kilburn, J. P., et al. (2001) Tet Lett 42:2583-2586 amino acid ester delFresno, M., et al. (1998) Tet Lett 39:2639- 2642 carboxylic acidsAlvarez- Gutierrez, J. M., et al. (2000) Tet Lett 41:609- 612

Rinnová, M., et al. (2002) J. Comb. Chem 4:209-213

Makara, G. M., et al. (2002) Organic Lett 4:1751-1754

Schell, P., et al. (2005) J. Comb. Chem 7:96-98 amino acids Feliu, L.,et al. (2003) J. Comb. Chem. 5:356-361 amino acids Hiroshige, M., et al.(1995) J. Am. Chem. Soc. 117:11590- 11591 amino acids Bose, A. K., etal. (2005) Tet Lett 46:1901- 1903

One advantage of the methods of the invention is that they can be usedto prepare libraries comprising vast numbers of compounds. The abilityto amplify encoding oligonucleotide sequences using known methods suchas polymerase chain reaction (“PCR”) means that selected molecules canbe identified even if relatively few copies are recovered. This allowsthe practical use of very large libraries, which, as a consequence oftheir high degree of complexity, either comprise relatively few copiesof any given library member, or require the use of very large volumes.For example, a library consisting of 108 unique structures in which eachstructure has 1×10¹² copies (about 1 picomole), requires about 100 L ofsolution at 1 μM effective concentration. For the same library, if eachmember is represented by 1,000,000 copies, the volume required is 100 μLat 1 μM effective concentration.

In a preferred embodiment, the library comprises from about 10³ to about10¹⁵ copies of each library member. Given differences in efficiency ofsynthesis among the library members, it is possible that differentlibrary members will have different numbers of copies in any givenlibrary. Therefore, although the number of copies of each membertheoretically present in the library may be the same, the actual numberof copies of any given library member is independent of the number ofcopies of any other member. More preferably, the compound libraries ofthe invention include at least about 10⁵, 10⁶ or 10⁷ copies of eachlibrary member, or of substantially all library members. By“substantially all” library members is meant at least about 85% of themembers of the library, preferably at least about 90%, and morepreferably at least about 95% of the members of the library.

Preferably, the library includes a sufficient number of copies of eachmember that multiple rounds (i.e., two or more) of selection against abiological target can be performed, with sufficient quantities ofbinding molecules remaining following the final round of selection toenable amplification of the oligonucleotide tags of the remainingmolecules and, therefore, identification of the functional moieties ofthe binding molecules. A schematic representation of such a selectionprocess is illustrated in FIG. 6, in which 1 and 2 represent librarymembers, B is a target molecule and X is a moiety operatively linked toB that enables the removal of B from the selection medium. In thisexample, compound 1 binds to B, while compound 2 does not bind to B. Theselection process, as depicted in Round 1, comprises (I) contacting alibrary comprising compounds 1 and 2 with B-X under conditions suitablefor binding of compound 1 to B; (II) removing unbound compound 2, (III)dissociating compound 1 from B and removing BX from the reaction medium.The result of Round 1 is a collection of molecules that is enriched incompound 1 relative to compound 2. Subsequent rounds employing stepsI-III result in further enrichment of compound 1 relative to compound 2.Although three rounds of selection are shown in FIG. 6, in practice anynumber of rounds may be employed, for example from one round to tenrounds, to achieve the desired enrichment of binding molecules relativeto non-binding molecules.

In the embodiment shown in FIG. 6, there is no amplification (synthesisof more copies) of the compounds remaining after any of the rounds ofselection. Such amplification can lead to a mixture of compounds whichis not consistent with the relative amounts of the compounds remainingafter the selection. This inconsistency is due to the fact that certaincompounds may be more readily synthesized that other compounds, and thusmay be amplified in a manner which is not proportional to their presencefollowing selection. For example, if compound 2 is more readilysynthesized than compound 1, the amplification of the moleculesremaining after Round 2 would result in a disproportionate amplificationof compound 2 relative to compound 1, and a resulting mixture ofcompounds with a much lower (if any) enrichment of compound 1 relativeto compound 2.

In one embodiment, the target is immobilized on a solid support by anyknown immobilization technique. The solid support can be, for example, awater-insoluble matrix contained within a chromatography column or amembrane. The encoded library can be applied to a water-insoluble matrixcontained within a chromatography column. The column is then washed toremove non-specific binders. Target-bound compounds can then bedissociated by changing the pH, salt concentration, organic solventconcentration, or other methods, such as competition with a known ligandto the target.

In another embodiment, the target is free in solution and is incubatedwith the encoded library. Compounds which bind to the target (alsoreferred to herein as “ligands”) are selectively isolated by a sizeseparation step such as gel filtration or ultrafiltration. In oneembodiment, the mixture of encoded compounds and the target biomoleculeare passed through a size exclusion chromatography column (gelfiltration), which separates any ligand-target complexes from theunbound compounds. The ligand-target complexes are transferred to areverse-phase chromatography column, which dissociates the ligands fromthe target. The dissociated ligands are then analyzed by PCRamplification and sequence analysis of the encoding oligonucleotides.This approach is particularly advantageous in situations whereimmobilization of the target may result in a loss of activity.

Accordingly, in one aspect of the invention, methods are provided foridentifying one or more compounds in a library of compounds, produced asdescribed herein, that bind to a biological target and subsequentlydetermining the structure of the functional moieties of the member(s) ofthe library of compounds that bind to the biological target.

For example, in one embodiment, one or more compounds which bind to abiological target can be identified by a method comprising the steps of:

(A) synthesizing a library of compounds, wherein the compounds comprisea functional moiety comprising two or more building blocks which isoperatively linked to an initial oligonucleotide which identifies thestructure of the functional moiety by:

-   -   (i) providing a solution comprising m initiator compounds,        wherein m is an integer of 1 or greater, where the initiator        compounds consist of a functional moiety comprising n building        blocks, where n is an integer of 1 or greater, which is        operatively linked to an initial oligonucleotide which        identifies the n building blocks;    -   (ii) dividing the solution of step (i) into r reaction vessels,        wherein r is an integer of 2 or greater, thereby producing r        aliquots of the solution;    -   (iii) reacting the initiator compounds in each reaction vessel        with one of r building blocks, thereby producing r aliquots        comprising compounds consisting of a functional moiety        comprising n+1 building blocks operatively linked to the initial        oligonucleotide; and    -   (iv) reacting the initial oligonucleotide in each aliquot with        one of a set of r distinct incoming oligonucleotides in the        presence of an enzyme which catalyzes the ligation of the        incoming oligonucleotide and the initial oligonucleotide, under        conditions suitable for enzymatic ligation of the incoming        oligonucleotide and the initial oligonucleotide; thereby        producing r aliquots of molecules consisting of a functional        moiety comprising n+1 building blocks operatively linked to an        elongated oligonucleotide which encodes the n+1 building blocks;

(B) contacting the biological target with the library of compounds, or aportion thereof, under conditions suitable for at least one member ofthe library of compounds to bind to the target;

(C) removing library members that do not bind to the target;

(D) sequencing the encoding oligonucleotides of the at least one memberof the library of compounds which binds to the target, and

(E) using the sequences determined in step (D) to determine thestructure of the functional moieties of the members of the library ofcompounds which bind to the biological target, thereby identifying oneor more compounds which bind to the biological target.

In one embodiment, the method further comprises ligating a degeneratecapping oligonucleotide to the members of the library of compounds inthe presence of an enzyme which catalyzes the ligation and polymerizingthe degenerate capping oligonucleotide with an enzyme that catalyzes thepolymerization of DNA.

In one embodiment, the method may further comprise amplifying theencoding oligonucleotide of the at least one member of the library ofcompounds which binds to the target prior to sequencing.

In one embodiment of the invention, the selection and enrichment of thelibrary is monitored using an oligonucleotide array. For example, alibrary of compounds may be hybridized to a solid surface, such as achip comprising oligonucleotides, e.g., an Affymetrix oligonucleotidechip, which is subsequently flouresced to detect the oligonucleotidetags bound to the surface. This hybridization can be repeated at eachsuccessive step of the screening process for identifying a compound witha desired biological activity.

In one embodiment, the library of compounds comprising encodingoligonucleotides which are optionally attached to capture beads asdescribed above are emulsified as a heat stable water-in-oil emulsion toform a microcapsule according to the methods described in PCTPublications WO 2004/069849, WO 2005/003375, and WO 2005/073410. In oneembodiment, the emulsion can be generated by suspending theoligonucleotide tag, with or without attached beads, in amplificationsolution, e.g., forming a “microreactor.” As used herein, the term“amplification solution” means the sufficient mixture of reagents thatis necessary to perform amplification of template DNA. One example of anamplification solution, is a PCR amplification solution, that one ofskill in the art can readily prepare.

In one embodiment of the invention, the library of compounds comprisingencoding oligonucleotides are amplified to increase the copy number ofencoding oligonucleotide molecules prior to sequencing. Encodingoligonucleotides may be amplified by any suitable method of DNAamplification including, for example, temperature cycling-polymerasechain reaction (PCR) (see, e.g., Saiki, et al. (1995) Science230:1350-1354; Gingeras, et al. WO 88/10315; Davey, et al. EuropeanPatent Application Publication No. 329,822; Miller, et al. WO 89/06700),ligase chain reaction (see, e.g., Barany (1991) Proc. Natl Acad. Sci.USA 88:189-193; Barringer, et al. (1990) Gene 89:117-122),transcription-based amplification (see, e.g., Kwoh, et al. (1989) Proc.Natl. Acad. Sci. USA 86:1173-1177) isothermal amplificationsystems—self-sustaining, sequence replication (see, e.g., Guatelli, etal. (1990) Proc. Natl. Acad. Sci. USA 87:1874-1878); the Qp replicasesystem (see, e.g., Lizardi, et al. (1988) BioTechnology 6: 1197-1202);strand displacement amplification (Walker, et al. (1992) Nucleic AcidsRes 20(7): 1691-6; the methods described by Walker, et al. (Proc. Natl.Acad. Sci. USA (1992) 1:89(l):392-6; the methods described by Kievits,et al. (J Virol Methods (1991) 35(3):273-86; “race” (Frohman, In: PCRProtocols: A Guide to Methods and Applications, Academic Press, NY(1990)); “one-sided PCR” (Ohara, et al. (1989) Proc. Natl. Acad. Sci.U.S.A. 86.5673-5677); “di-oligonucleotide” amplification, isothermalamplification (Walker, et al. (1992) Proc. Natl. Acad. Sci. U.S.A.89:392-396), and rolling circle amplification (reviewed in U.S. Pat. No.5,714,320).

In one embodiment, the library of compounds comprising encodingoligonucleotides is amplified prior to sequence analysis in order tominimize any potential skew in the population distribution of DNAmolecules present in the selected library mix. For example, only a smallamount of library is recovered after a selection step and is typicallyamplified using PCR prior to sequence analysis. PCR has the potential toproduce a skew in the population distribution of DNA molecules presentin the selected library mix. This is especially problematic when thenumber of input molecules is small and the input molecules are poor PCRtemplates. PCR products produced at early cycles are more efficienttemplates than covalent duplex library, and therefore the frequency ofthese molecules in the final amplified population may be much higherthan in original input template.

Accordingly, in order to minimize this potential PCR skew, in oneembodiment of the invention, a population of single-strandedoligonucleotides corresponding to the individual library members isproduced by, for example, using one primer in a reaction, followed byPCR amplification using two primers. By doing so, there is a linearaccumulation of single-stranded primer-extension product prior toexponential amplification using PCR, and the diversity and distributionof molecules in the accumulated primer-extension product more accuratelyreflect the diversity and distribution of molecules present in theoriginal input template, since the exponential phase of amplificationoccurs only after much of the original molecular diversity present isrepresented in the population of molecules produced during theprimer-extension reaction.

Preferably, DNA amplification is performed by PCR. PCR amplificationmethods are described in detail in U.S. Pat. Nos. 4,683,192, 4,683,202,4,800,159, and 4,965,188, and at least in PCR Technology: Principles andApplications for DNA Amplification, H. Erlich, ed., Stockton Press, NewYork (1989); and PCR Protocols: A Guide to Methods and Applications,Innis et al., eds., Academic Press, San Diego, Calif. (1990). Thecontents of all the foregoing documents are incorporated herein byreference. In one embodiment of the invention, PCR amplification of thetemplate is performed on an oligonucleotide tag bound to a bead, andencapsulated with a PCR solution comprising all the necessary reagentsfor a PCR reaction. In another embodiment of the invention, PCRamplification of the template is performed on a soluble oligonucleotidetag (i.e., not bound to a bead) which is encapsulated with a PCRsolution comprising all the necessary reagents for a PCR reaction. PCRis subsequently performed by exposing the emulsion to any suitablethermocycling regimen known in the art. In one embodiment, between 30and 50 cycles, preferably about 40 cycles, of amplification areperformed. It is desirable, but not necessary, that following theamplification procedure there be one or more hybridization and extensioncycles following the cycles of amplification. In a another embodiment,between 10 and 30 cycles, or about 25 cycles, of hybridization andextension are performed. In one embodiment, the template DNA isamplified until about at least two million to fifty million copies orabout ten million to thirty million copies of the template DNA areimmobilized per bead.

Following amplification of the encoding oligonucleotide tag, theemulsion is “broken” (also referred to as “demulsification” in the art).There are many well known methods of breaking an emulsion (see, e.g.,U.S. Pat. No. 5,989,892 and references cited therein) and one of skillin the art would be able to select the proper method. For example, theemulsion may be broken by adding additional oil to cause the emulsion toseparate into two phases. The oil phase is then removed, and a suitableorganic solvent (e.g., hexanes) is added. After mixing, the oil/organicsolvent phase is removed. This step may be repeated several times.Finally, the aqueous layers is removed. If the encoding oligonucleotidesare attached to beads, the beads are then washed with an organicsolvent/annealing buffer mixture, and then washed again in annealingbuffer. Suitable organic solvents include alcohols such as methanol,ethanol and the like.

The amplified encoding oligonucleotides may then be resuspended inaqueous solution for use, for example, in a sequencing reactionaccording to known technologies. (See, e.g., Sanger, F. et al. (1977)Proc. Natl. Acad. Sci. U.S.A. 75:5463-5467; Maxam & Gilbert (1977) ProcNatl Acad Sci USA 74:560-564; Ronaghi, et al. (1998) Science 281:363,365; Lysov, et al. (1988) Dokl Akad Nauk SSSR 303:1508-1511; Bains &Smith (1988) J Theor Biol 135:303-307; Drnanac, R. et al. (1989)Genomics 4:114-128; Khrapko, et al. (1989) FEBS Lett 256:118-122;Pevzner (1989) J Biomol Struct Dyn 7:63-73; Southern, et al. (1992)Genomics 13:1008-1017).

If the encoding oligonucleotide attached to a bead is to be used in apyrophosphate-based sequencing reaction (described, e.g., in U.S. Pat.Nos. 6,274,320, 6258,568 and 6,210,891, and incorporated herein byreference), then it is necessary to remove the second strand of the PCRproduct and anneal a sequencing primer to the single stranded templatethat is bound to the bead.

Briefly, the second strand is melted away using any number of commonlyknown methods such as NaOH, low ionic (e.g., salt) strength, or heatprocessing. Following this melting step, the beads are pelleted and thesupernatant is discarded. The beads are resuspended in an annealingbuffer, the sequencing primer added, and annealed to the bead-attachedsingle stranded template using a standard annealing cycle.

The amplified encoding oligonucleotide, optionally on a bead, may besequenced either directly or in a different reaction vessel. In oneembodiment of the present invention, the encoding oligonucleotide issequenced directly on the bead by transferring the bead to a reactionvessel and subjecting the DNA to a sequencing reaction (e.g.,pyrophosphate or Sanger sequencing). Alternatively, the beads may beisolated and the encoding oligonucleotide may be removed from each beadand sequenced. Nonetheless, the sequencing steps may be performed oneach individual bead and/or the beads that contain no nucleic acidtemplate may be removed prior to distribution to a reaction vessel by,for example, biotin-streptavidin magnetic beads. Other suitable methodsto separate beads are described in, for example, Bauer, J. (1999) J.Chromatography B, 722:55-69 and in Brody et al. (1999) Applied PhysicsLett. 74:144-146.

Once the encoding oligonucleotide tag has been amplified, the sequenceof the tag, and ultimately the composition of the selected molecule, canbe determined using nucleic acid sequence analysis, a well knownprocedure for determining the sequence of nucleotide sequences. Nucleicacid sequence analysis is approached by a combination of (a)physiochemical techniques, based on the hybridization or denaturation ofa probe strand plus its complementary target, and (b) enzymaticreactions with polymerases.

The nucleotide sequence of the oligonucleotide tag comprised ofpolynucleotides that identify the building blocks that make up thefunctional moiety as described herein, may be determined by the use ofany sequencing method known to one of skill in the art. Suitable methodsare described in, for example, Sanger, F. et al. (1977) Proc. Natl.Acad. Sci. U.S.A. 75:5463-5467; Maxam & Gilbert (1977) Proc Natl AcadSci USA 74:560-564; Ronaghi, et al. (1998) Science 281:363, 365; Lysov,et al. (1988) Dokl Akad Nauk SSSR 303:1508-1511; Bains & Smith (1988) JTheorBiol 135:303-307; Dmanac, R. et al. (1989) Genomics 4:114-128;Khrapko, et al. (1989) FEBS Lett 256:118-122 ; Pevzner (1989) J BiomolStruct Dyn 7:63-73; Southern, et al. (1992) Genomics 13:1008-1017).

In a preferred embodiment, the oligonucleotide tags are sequenced usingthe apparati and methods described in PCT publications WO 2004/069849,WO 2005/003375, WO 2005/073410, and WO 2005/054431, the entire contentsof each of which are incorporated herein by this reference.

In one embodiment, a region of the sequence product is determined byannealing a sequencing primer to a region of the template nucleic acid,and then contacting the sequencing primer with a DNA polymerase and aknown nucleotide triphosphate, i.e., dATP, dCTP, dGTP, dTTP, or ananalog of one of these nucleotides, such as, for example, α-thio-dATP.The sequence can be determined by detecting a sequence reactionbyproduct, using methods known in the art.

In some embodiments, the nucleotide is modified to contain adisulfide-derivative of a hapten, such as biotin. The addition of themodified nucleotide to the nascent primer annealed to an anchoredsubstrate is analyzed by a suitable post-polymerization method. Suchmethods enable a nucleotide to be identified in a given target position,and the DNA to be sequenced simply and rapidly while avoiding the needfor electrophoresis and the use of potentially dangerous radiolabels.

Examples of suitable haptens include, for example, biotin, digoxygenin,the fluorescent dye molecules cy3 and cy5, and fluorescein. Theattachment of the hapten can occur through linkages via the sugar, thebase, and/or via the phosphate moiety on the nucleotide. Exemplary meansfor signal amplification following polymerization and extension of theencoding oligonucleotide include fluorescent, electrochemical andenzymatic means. In one embodiment using enzymatic amplification, theenzyme is one for which light-generating substrates are known, such as,for example, alkaline phosphatase (AP), horse-radish peroxidase (HRP),beta-galactosidase, or luciferase, and the means for the detection ofthese light-generating (chemiluminescent) substrates can include a CCDcamera.

A sequencing primer can be of any length or base composition, as long asit is capable of specifically annealing to a region of the nucleic acidtemplate (i.e., the oligonucleotide tag). The oligonucleotide primers ofthe present invention may be synthesized by conventional technology,e.g., with a commercial oligonucleotide synthesizer and/or by ligatingtogether subfragments that have been so synthesized. No particularstructure for the sequencing primer is required so long as it is able tospecifically prime a region on the template nucleic acid. The sequencingprimer is extended with the DNA polymerase to form a sequence product.The extension is performed in the presence of one or more types ofnucleotide triphosphates, and if desired, auxiliary binding proteins.Incorporation of the dNTP is determined by, for example, assaying forthe presence of a sequencing byproduct.

In one embodiment, the nucleic acid sequence of the oligonucleotide tagis determined by the use of the polymerase chain reaction (PCR).Briefly, the oligonucleotide tag (optionally attached to a bead) issubjected to a PCR reaction as follows. The appropriate sample iscontacted with a PCR primer pair, each member of the pair having apre-selected nucleotide sequence. The PCR primer pair is capable ofinitiating primer extension reactions by hybridizing to a PCR primerbinding site on the encoding oligonucleotide tag.

The PCR reaction is performed by mixing the PCR primer pair, preferablya predetermined amount thereof, with the nucleic acids of the encodingoligonucleotide tag, preferably a predetermined amount thereof, in a PCRbuffer to form a PCR reaction admixture. The admixture is thermocycledfor a number of cycles, which is typically predetermined, sufficient forthe formation of a PCR reaction product. A sufficient amount of productis one that can be isolated in a sufficient amount to allow for DNAsequence determination.

PCR is typically carried out by thermocycling i.e., repeatedlyincreasing and decreasing the temperature of a PCR reaction admixturewithin a temperature range whose lower limit is about 30° C. to about55° C. and whose upper limit is about 90° C. to about 100° C. Theincreasing and decreasing can be continuous, but is preferably phasicwith time periods of relative temperature stability at each oftemperatures favoring polynucleotide synthesis, denaturation andhybridization.

The PCR reaction is performed using any suitable method. Generally itoccurs in a buffered aqueous solution, i.e., a PCR buffer, preferably ata pH of 7-9. Preferably, a molar excess of the primer is present. Alarge molar excess is preferred to improve the efficiency of theprocess.

The PCR buffer also contains the deoxyribonucleotide triphosphates(polynucleotide synthesis substrates) dATP, dCTP, dGTP, and dTTP and apolymerase, typically thermostable, all in adequate amounts for primerextension (polynucleotide synthesis) reaction. The resulting solution(PCR admixture) is heated to about 90° C.-100° C. for about 1 to 10minutes, preferably from 1 to 4 minutes. After this heating period thesolution is allowed to cool to 54° C., which is preferable for primerhybridization. The synthesis reaction may occur at a temperature rangingfrom room temperature up to a temperature above which the polymerase(inducing agent) no longer functions efficiently. Thus, for example, ifDNA polymerase is used, the temperature is generally no greater thanabout 40° C. The thermocycling is repeated until the desired amount ofPCR product is produced. An exemplary PCR buffer comprises the followingreagents: 50 mM KCl; 10 mM Tris-HCl at pH 8.3; 1.5 mM MgCl.sub.2; 0.001%(wt/vol) gelatin, 200 μM dATP; 200 μM dTTP; 200 μM dCTP; 200 μM dGTP;and 2.5 units Thermus aquaticus (Taq) DNA polymerase 1 per 100microliters of buffer.

Suitable enzymes for elongating the primer sequences include, forexample, E. coli DNA polymerase I, Taq DNA polymerase, Klenow fragmentof E. coli DNA polymerase I, T4 DNA polymerase, other available DNApolymerases, reverse transcriptase, and other enzymes, includingheat-stable enzymes, which will facilitate combination of thenucleotides in the proper manner to form the primer extension productswhich are complementary to each nucleic acid strand. Generally, thesynthesis will be initiated at the 3′ end of each primer and proceed inthe 5′ direction along the template strand, until synthesis terminates,producing molecules of different lengths. The newly synthesized DNAstrand and its complementary strand form a double-stranded moleculewhich can be used in the succeeding steps of the analysis process.

In one embodiment, the nucleotide sequence of the oligonucleotide tag isdetermined by measuring inorganic pyrophosphate (PPi) liberated from anucleotide triphosphate (dNTP) as the dNMP is incorporated into anextended sequence primer. This method of sequencing, termedPyrosequencing™ technology (PyroSequencing AB, Stockholm, Sweden) can beperformed in solution (liquid phase) or as a solid phase technique.PPi-based sequencing methods are described in, e.g., U.S. Pat. Nos.6,274,320, 6258,568 and 6,210,891, WO9813523A1, Ronaghi, et al. (1996)Anal Biochem. 242:84-89, Ronaghi, et al. (1998) Science 281:363-365, andUSSN 2001/0024790. These disclosures of PPi sequencing are incorporatedherein in their entirety, by reference. See also, e.g., U.S. Pat. Nos.6,210,891 and 6,258,568, each of which are fully incorporated herein bythis reference.

Pyrophosphate can be detected by a number of different methodologies,and various enzymatic methods have been previously described (see e.g.,Reeves, et al. (1969) Anal Biochem. 28:282-287; Guillory, et al. (1971)Anal Biochem. 39:170-180; Johnson, et al. (1968) Anal Biochem. 15:273;Cook, et al. 1978. Anal Biochem. 91:557-565; and Drake, et al. (1979)Anal Biochem. 94: 117-120).

In one embodiment, PPi is detected enzymatically (e.g., by thegeneration of light). Such methods enable a nucleotide to be identifiedin a given target position, and the DNA to be sequenced simply andrapidly while avoiding the need for electrophoresis and the use ofpotentially dangerous radiolabels.

In one embodiment, the PPi and a coupled luciferase-luciferin reactionis used to generate light for detection. In another embodiment, the PPiand a coupled sulfurylase/luciferase reaction is used to generate lightfor detection as described in U.S. Pat. No. 6,902,921, the contents ofwhich are hereby expressly incorporated herein by reference. In oneembodiment, the sulfurylase is thermostable. In some embodiments, eitheror both the sulfurylase and luciferase are immobilized on one or moremobile solid supports disposed at each reaction site.

In another embodiment, the nucleotide sequence of the oligonucleotidetag may be determined according to the methods described in PCTPublication No. WO 01/23610, the contents of which are incorporatedherein by reference. Briefly, a target nucleotide sequence can bedetermined by generating its complement using the polymerase reaction toextend a suitable primer, and characterizing the successiveincorporation of bases that generate the complement sequence. The targetsequence is, typically, immobilized on a solid support. Each of thedifferent bases A, T, G, or C is then brought, by sequential addition,into contact with the target, and any incorporation events are detectedvia a suitable label attached to the base.

A labeled base is incorporated into the complementary sequence by theuse of a polymerase, e.g., a polymerase with a 3′ to 5′ exonucleaseactivity (e.g., DNA polymerase I, the Klenow fragment, DNA polymeraseIII, T4 DNA polymerase, and T7 DNA polymerase). Following detection ofthe incorporated labeled base, the polymerase replaces the terminallylabeled base with a corresponding unlabelled base, thus permittingfurther sequencing to occur.

In yet another embodiment, the nucleotide sequence of theoligonucleotide tag is determined by the use of single moleculesequencing by synthesis methods described in, for example, PCTPublication No. WO 2005/080605, the entire contents of which areexpressly incorporated by reference. The benefit of using thistechnology is that it eliminates the need for DNA amplification prior tosequencing, thus, abolishing the introduction of amplification errorsand bias. Briefly, the encoding oligonucleotide is hybridized to auniversal primer immobilized on a solid surface. Theoligonucleotide:primer duplexes are visualized by, e.g., illuminatingthe surface with a laser and imaging with a digital TV camera connectedto a microscope, and the positions of all the duplexes on the surfaceare recorded. DNA polymerase and one type of fluorescently labelednucleotide, e.g., A, is added to the surface and incorporated into theappropriate primer. Subsequently, the polymerase and the unincorporatednucleotides are washed from the surface and the incorporated nucleotideis visualized by, e.g., illuminating the surface with a laser andimaging with a camera as before to record the positions of theincorporated nucleotides. The fluorescent label is removed from eachincorporated nucleotide and the process is repeated with the nextnucleotide, e.g., G, stepping through A, C, G, T, until the desiredread-length is achieved.

One group of fluorescent dyes suitable for this method of sequencing isfluorescence resonance energy transfer (FRET) dyes, including donor andacceptor energy fluorescent dyes and linkers such as, for example, Cy3and Cy5. FRET is a phenomenon described in, for example, Selvin (1995)Methods in Enzym. 246:300. FRET can detect the incorporation of multiplenucleotides into a single oligonucleotide molecule and is, thus, usefulfor sequencing the encoding oligonucleotides of the invention.Sequencing methods using FRET are described in, for example, PCTPublication No. WO 2005/080605, the entire contents of which areexpressly incorporated by reference. Alternatively, quantum dots can beused as a labeling moiety on the different types of nucleotides for usein sequencing reactions.

Once single ligands are identified by the above-described process,various levels of analysis can be applied to yield structure-activityrelationship information and to guide further optimization of theaffinity, specificity and bioactivity of the ligand. For ligands derivedfrom the same scaffold, three-dimensional molecular modeling can beemployed to identify significant structural features common to theligands, thereby generating families of small-molecule ligands thatpresumably bind at a common site on the target biomolecule.

A variety of screening approaches can be used to obtain ligands thatpossess high affinity for one target but significantly weaker affinityfor another closely related target. One screening strategy is toidentify ligands for both biomolecules in parallel experiments and tosubsequently eliminate common ligands by a cross-referencing comparison.In this method, ligands for each biomolecule can be separatelyidentified as disclosed above. This method is compatible with bothimmobilized target biomolecules and target biomolecules free insolution.

For immobilized target biomolecules, another strategy is to add apreselection step that eliminates all ligands that bind to thenon-target biomolecule from the library. For example, a firstbiomolecule can be contacted with an encoded library as described above.Compounds which do not bind to the first biomolecule are then separatedfrom any first biomolecule-ligand complexes which form. The secondbiomolecule is then contacted with the compounds which did not bind tothe first biomolecule. Compounds which bind to the second biomoleculecan be identified as described above and have significantly greateraffinity for the second biomolecule than to the first biomolecule.

A ligand for a biomolecule of unknown function which is identified bythe method disclosed above can also be used to determine the biologicalfunction of the biomolecule. This is advantageous because although newgene sequences continue to be identified, the functions of the proteinsencoded by these sequences and the validity of these proteins as targetsfor new drug discovery and development are difficult to determine andrepresent perhaps the most significant obstacle to applying genomicinformation to the treatment of disease. Target-specific ligandsobtained through the process described in this invention can beeffectively employed in whole cell biological assays or in appropriateanimal models to understand both the function of the target protein andthe validity of the target protein for therapeutic intervention. Thisapproach can also confirm that the target is specifically amenable tosmall molecule drug discovery.

In one embodiment, one or more compounds within a library of theinvention are identified as ligands for a particular biomolecule. Thesecompounds can then be assessed in an in vitro assay for the ability tobind to the biomolecule. Preferably, the functional moieties of thebinding compounds are synthesized without the oligonucleotide tag orlinker moiety, and these functional moieties are assessed for theability to bind to the biomolecule.

The effect of the binding of the functional moieties to the biomoleculeon the function of the biomolecule can also be assessed using in vitrocell-free or cell-based assays. For a biomolecule having a knownfunction, the assay can include a comparison of the activity of thebiomolecule in the presence and absence of the ligand, for example, bydirect measurement of the activity, such as enzymatic activity, or by anindirect measure, such as a cellular function that is influenced by thebiomolecule. If the biomolecule is of unknown function, a cell whichexpresses the biomolecule can be contacted with the ligand and theeffect of the ligand on the viability, function, phenotype, and/or geneexpression of the cell is assessed. The in vitro assay can be, forexample, a cell death assay, a cell proliferation assay or a viralreplication assay. For example, if the biomolecule is a proteinexpressed by a virus, a cell infected with the virus can be contactedwith a ligand for the protein. The affect of the binding of the ligandto the protein on viral viability can then be assessed.

A ligand identified by the method of the invention can also be assessedin an in vivo model or in a human. For example, the ligand can beevaluated in an animal or organism which produces the biomolecule. Anyresulting change in the health status (e.g., disease progression) of theanimal or organism can be determined.

For a biomolecule, such as a protein or a nucleic acid molecule, ofunknown function, the effect of a ligand which binds to the biomoleculeon a cell or organism which produces the biomolecule can provideinformation regarding the biological function of the biomolecule. Forexample, the observation that a particular cellular process is inhibitedin the presence of the ligand indicates that the process depends, atleast in part, on the function of the biomolecule.

Ligands identified using the methods of the invention can also be usedas affinity reagents for the biomolecule to which they bind. In oneembodiment, such ligands are used to effect affinity purification of thebiomolecule, for example, via chromatography of a solution comprisingthe biomolecule using a solid phase to which one or more such ligandsare attached.

In addition to the screening of encoded libraries as described herein,other traditional drug discovery methods, such as phage display,differential display (mRNA display), and aptamer/SELEX, could benefitfrom the methods of the invention which eliminate the introduction ofamplification errors and biases. For example, multiple rounds ofselection using phage display (described in, for example, PCTPublication Nos. WO91/18980, WO91/19818, and WO92/18619, and U.S. Pat.No. 5,223,409, the entire contents of each of which are incorporatedherein by reference) can cause host toxicity and, consequently, loss orunder-representation of desired library members (see, e.g., Daugherty,P. S., et al. (1999) Protein Engineering 12(7):613-621 and Holt, L. J.,et al. (2000) Nucleic Acids Res. 28(15):E72). Moreover, methods such asSystematic Evolution of Ligands by EXponential enrichment (also known asSELEX which is described in, for example, U.S. Pat. Nos. 5,654,151,5,503,978, 5,567,588 and 5,270,163, as well as PCT Publication Nos. WO96/38579 and WO9927133A1, the entire contents of each of which areincorporated herein by reference) introduce biases due to the need formultiple rounds of selection, i.e., partitioning unbound nucleic acidsfrom those nucleic acids which have bound specifically to a targetmolecule, and multiple rounds of amplification of the nucleic acids thathave bound to the target by reverse transcription and PCR. Similarly,methods of selection like differential display (described in, forexample, U.S. Pat. Nos. 5,580,726 and 5,700,644, the entire contents ofeach of which are incorporated herein by reference) rely on multiplerounds of PCR amplification which also leads to unequal representationof the clones in the library. Thus, the foregoing multi-step selectionprocesses may benefit from the methods described herein which employmassively parallel sequencing approaches (such as, for example, apyrophosphate-based sequencing method or a single molecule sequencing bysynthesis method) which leads to the accurate identification of acompound with a desired biological activity without the need for anynucleic acid amplification.

This invention is further illustrated by the following examples whichshould not be construed as limiting. The contents of all references,patents and published patent applications cited throughout thisapplication, as well as the Figures and the Sequence Listing, are herebyincorporated in reference.

EXAMPLES Example 1 Synthesis and Characterization of a Library on theOrder of 10⁵ Members

The synthesis of a library comprising on the order of 10⁵ distinctmembers was accomplished using the following reagents:

Single letter codes for deoxyribonucleotides:

-   -   A=adenosine    -   C=cytidine    -   G=guanosine    -   T=thymidine        Building block precursors:

Oligonucleotide tags: Tag Sequence number 3′-PO₄-GCAACGAAG (SEQ ID NO:1)1.1 ACCGTTGCT-PO₃-5′ (SEQ ID NO:2) 5′-PO₃-GCGTACAAG (SEQ ID NO:3) 1.2ACCGCATGT-PO₃-5′ (SEQ ID NO:4) 5′-PO₃-GCTCTGTAG (SEQ ID NO:5) 1.3ACCGAGACA-PO₃-5′ (SEQ ID NO:6) 5′-PO₃-GTGCCATAG (SEQ ID NO:7) 1.4ACCACGGTA-PO₃-5′ (SEQ ID NO:8) 5′-PO₃-GTTGACCAG (SEQ ID NO:9) 1.5ACCAACTGG-PO₃-5′ (SEQ ID NO:10) 5′-PO₃-CGACTTGAC (SEQ ID NO:11) 1.6CAAGTCGCA-PO₃-5′ (SEQ ID NO:12) 5′-PO₃-CGTAGTCAG (SEQ ID NO:13) 1.7ACGCATCAG-PO₃-5′ (SEQ ID NO:14) 5′-PO₃-CCAGCATAG (SEQ ID NO:15) 1.8ACGGTCGTA-PO₃-5′ (SEQ ID NO:16) 5′-PO₃-CCTACAGAG (SEQ ID NO:17) 1.9ACGGATGTC-PO₃-5′ (SEQ ID NO:18) 5′-PO₃-CTGAACGAG (SEQ ID NO:19) 1.10CGTTCAGCA-PO₃-5′ (SEQ ID NO:20) 5′-PO₃-CTCCAGTAG (SEQ ID NO:21) 1.11ACGAGGTCA-PO₃-5′ (SEQ ID NO:22) 5′-PO₃-TAGGTCCAG (SEQ ID NO:23) 1.12ACATCCAGG-PO₃-5′ (SEQ ID NO:24) 5′-PO₃-GCGTGTTGT (SEQ ID NO:25) 2.1TCCGCACAA-PO₃-5′ (SEQ ID NO:26) 5′-PO₃-GCTTGGAGT (SEQ ID NO:27) 2.2TCCGAACCT-PO₃-5′ (SEQ ID NO:28) 5′-PO₃-GTCAAGCGT (SEQ ID NO:29) 2.3TCCAGTTCG-PO₃-5′ (SEQ ID NO:30) 5′-PO₃-CAAGAGCGT (SEQ ID NO:31) 2.4TCGTTCTCG-PO₃-5′ (SEQ ID NO:32) 5′-PO₃-CAGTTCGGT (SEQ ID NO:33) 2.5TCGTCAAGC-PO₃-5′ (SEQ ID NO:34) 5′-PO₃-CGAAGGAGT (SEQ. ID NO:35) 2.6TCGCTTCCT-PO₃-5′ (SEQ ID NO:36) 5′-PO₃-CGGTGTTGT (SEQ ID NO:37) 2.7TCGCCACAA-PO₃-5′ (SEQ ID NO:38) 5′-PO₃-CGTTGCTGT (SEQ ID NO:39) 2.8TCGCAACGA-PO₃-5′ (SEQ ID NO:40) 5′-PO₃-CCGATCTGT (SEQ ID NO:41) 2.9TCGGCTAGA-PO₃-5′ (SEQ ID NO:42) 5′-PO₃-CCTTCTCGT (SEQ ID NO:43) 2.10TCGGAAGAG-PO₃-5′ (SEQ ID NO:44) 5′-PO₃-TGAGTCCGT (SEQ ID NO:45) 2.11TCACTCAGG-PO₃-5′ (SEQ ID NO:46) 5′-PO₃-TGCTACGGT (SEQ ID NO:47) 2.12TCAGATTGC-PO₃-5′ (SEQ ID NO:48) 5′-PO₃-GTGCGTTGA (SEQ ID NO:49) 3.1CACACGCAA-PO₃-5′ (SEQ ID NO:50) 5′-PO₃-GTTGGCAGA (SEQ ID NO:51) 3.2CACAACCGT-PO₃-5′ (SEQ ID NO:52) 5′-PO₃-CCTGTAGGA (SEQ ID NO:53) 3.3CAGGACATC-PO₃-5′ (SEQ ID NO:54) 5′-PO₃-CTGCGTAGA (SEQ ID NO:55) 3.4CAGACGCAT-PO₃-5′ (SEQ ID NO:56) 5′-PO₃-CTTACGCGA (SEQ ID NO:57) 3.5CAGAATGCG-PO₃-5′ (SEQ ID NO:58) 5′-PO₃-TGGTCACGA (SEQ ID NO:59) 3.6CAACCAGTG-PO₃-5′ (SEQ ID NO:60) 5′-PO₃-TCAGAGCGA (SEQ ID NO:61) 3.7CAAGTCTCG-PO₃-5′ (SEQ ID NO:62) 5′-PO₃-TTGCTCGGA (SEQ ID NO:63) 3.8CAAACGAGC-PO₃-5′ (SEQ ID NO:64) 5′-PO₃-GCAGTTGGA (SEQ ID NO:65) 3.9CACGTCAAC-PO₃-5′ (SEQ ID NO:66) 5′-PO₃-GCCTGAAGA (SEQ ID NO:67) 3.10CACGGACTT-PO₃-5′ (SEQ ID NO:68) 5′-PO₃-GTAGCCAGA (SEQ ID NO:69) 3.11CACATCGGT-PO₃-5′ (SEQ ID NO:70) 5′-PO₃-GTCGCTTGA (SEQ ID NO:71) 3.12CACAGCCAA-PO₃-5′ (SEQ ID NO:72) 5′-PO₃-GCCTAAGTT (SEQ ID NO:73) 4.1CTCGGATTC-PO₃-5′ (SEQ ID NO:74) 5′-PO₃-GTAGTGCTT (SEQ ID NO:75) 4.2CTCATCACG-PO₃-5′ (SEQ ID NO:76) 5′-PO₃-GTCGAAGTT (SEQ ID NO:77) 4.3CTCAGCTTC-PO₃-5′ (SEQ ID NO:78) 5′-PO₃-GTTTCGGTT (SEQ ID NO:79) 4.4CTCAAAGCC-PO₃-5′ (SEQ ID NO:80) 5′-PO₃-CAGCGTTTT (SEQ ID NO:81) 4.5CTGTCGCAA-PO₃-5′ (SEQ ID NO:82) 5′-PO₃-CATACGCTT (SEQ ID NO:83) 4.6CTGTATGCG-PO₃-5′ (SEQ ID NO:84) 5′-PO₃-CGATCTGTT (SEQ ID NO:85) 4.7CTGCTAGAC-PO₃-5′ (SEQ ID NO:86) 5′-PO₃-CGCTTTGTT (SEQ ID NO:87) 4.8CTGCGAAAC-PO₃-5′ (SEQ ID NO:88) 5′-PO₃-CCACAGTTT (SEQ ID NO:89) 4.9CTGGTGTCA-PO₃-5′ (SEQ ID NO:90) 5′-PO₃-CCTGAAGTT (SEQ ID NO:91) 4.10CTGGACTTC-PO₃-5′ (SEQ ID NO:92) 5′-PO₃-CTGACGATT (SEQ ID NO:93) 4.11CTGACTGCT-PO₃-5′ (SEQ ID NO:94) 5′-PO₃-CTCCACTTT (SEQ ID NO:95) 4.12CTGAGGTGA-PO₃-5′ (SEQ ID NO:96) 5′-PO₃-ACCAGAGCC (SEQ ID NO:97) 5.1AATGGTCTC-PO₃-5′ (SEQ ID NO:98) 5′-PO₃-ATCCGCACC (SEQ ID NO:99) 5.2AATAGGCGT-PO₃-5′ (SEQ ID NO:100) 5′-PO₃-GACGACACC (SEQ ID NO:101) 5.3AACTGCTGT-PO₃-5′ (SEQ ID NO:102) 5′-PO₃-GGATGGACC (SEQ ID NO:103) 5.4AACCTACCT-PO₃-5′ (SEQ ID NO:104) 5′-PO₃-GCAGAAGCC (SEQ ID NO:105) 5.5AACGTCTTC-PO₃-5′ (SEQ ID NO:106) 5′-PO₃-GCCATGTCC (SEQ ID NO:107) 5.6AACGGTACA-PO₃-5′ (SEQ ID NO:108) 5′-PO₃-GTCTGCTCC (SEQ ID NO:109) 5.7AACAGACGA-PO₃-5′ (SEQ ID NO:110) 5′-PO₃-CGACAGACC (SEQ ID NO:111) 5.8AAGCTGTCT-PO₃-5′ (SEQ ID NO:112) 5′-PO₃-CGCTACTCC (SEQ ID NO:113) 5.9AAGCGATGA-PO₃-5′ (SEQ ID NO:114) 5′-PO₃-CCACAGACC (SEQ ID NO:115) 5.10AAGGTGTCT-PO₃-5′ (SEQ ID NO:116) 5′-PO₃-CCTCTCTCC (SEQ ID NO:117) 5.11AAGGAGAGA-PO₃-5′ (SEQ ID NO:118) 5′-PO₃-CTCCTAGCC (SEQ ID NO:119) 5.12AAGAGCATC-PO₃-5′ (SEQ ID NO:120)

1× ligase buffer: 50 mM Tris, pH 7.5; 10 mM dithiothreitol; 10 mM MgCl₂;2.5 mM ATP; 50 mM NaCl.

10× ligase buffer: 500 mM Tris, pH 7.5; 100 mM dithiothreitol; 100 mMMgCl₂; 25 mM ATP; 500 mM NaCl

Cycle 1

To each of twelve PCR tubes was added 50 μL of a 1 mM solution ofCompound 1 in water; 75 μL of a 0.80 mM solution of one of Tags1.1-1.12; 15 μL 10× ligase buffer and 10 μL deionized water. The tubeswere heated to 95° C. for 1 minute and then cooled to 16° C. over 10minutes. To each tube was added 5,000 units T4 DNA ligase (2.5 μL of a2,000,000 unit/mL solution (New England Biolabs, Cat. No. M0202)) in 50μl 1× ligase buffer and the resulting solutions were incubated at 16° C.for 16 hours.

Following ligation, samples were transferred to 1.5 ml Eppendorf tubesand treated with 20 μL 5 M aqueous NaCl and 500 μL cold (−20° C.)ethanol, and held at −20° C. for 1 hour. Following centrifugation, thesupernatant was removed and the pellet was washed with 70% aqueousethanol at −20° C. Each of the pellets was then dissolved in 150 μL of150 mM sodium borate buffer, pH 9.4.

Stock solutions comprising one each of building block precursors BB1 toBB12, N,N-diisopropylethanolamine andO-(7-azabenzotriazol-1-yl)-1,1,3,3-tetramethyluroniumhexafluorophosphate, each at a concentration of 0.25 M, were prepared inDMF and stirred at room temperature for 20 minutes. The building blockprecursor solutions were added to each of the pellet solutions describedabove to provide a 10-fold excess of building block precursor relativeto linker. The resulting solutions were stirred. An additional 10equivalents of building block precursor was added to the reactionmixture after 20 minute, and another 10 equivalents after 40 minutes.The final concentration of DMF in the reaction mixture was 22%. Thereaction solutions were then stirred overnight at 4° C. The reactionprogress was monitored by RP-HPLC using 50 mM aqueous tetraethylammoniumacetate (pH=7.5) and acetonitrile, and a gradient of 2-46% acetonitrileover 14 min. Reaction was stopped when ˜95% of starting material(linker) is acylated. Following acylation the reaction mixtures werepooled and lyophilized to dryness. The lyophilized material was thenpurified by HPLC, and the fractions corresponding to the library(acylated product) were pooled and lyophilized.

The library was dissolved in 2.5 ml of 0.01M sodium phosphate buffer(pH=8.2) and 0.1 ml of piperidine (4% v/v) was added to it. The additionof piperidine results in turbidity which does not dissolve on mixing.The reaction mixtures were stirred at room temperature for 50 minutes,and then the turbid solution was centrifuged (14,000 rpm), thesupernatant was removed using a 200 μl pipette, and the pellet wasresuspended in 0.1 ml of water. The aqueous wash was combined with thesupernatant and the pellet was discarded. The deprotected library wasprecipitated from solution by addition of excess ice-cold ethanol so asto bring the final concentration of ethanol in the reaction to 70% v/v.Centrifugation of the aqueous ethanol mixture gave a white pelletcomprising the library. The pellet was washed once with cold 70% aq.ethanol. After removal of solvent the pellet was dried in air (˜5 min.)to remove traces of ethanol and then used in cycle 2. The tags andcorresponding building block precursors used in Round 1 are set forth inTable 1, below. TABLE 1 Building Block Precursor Tag BB1 1.11 BB2 1.6BB3 1.2 BB4 1.8 BB5 1.1 BB6 1.10 BB7 1.12 BB8 1.5 BB9 1.4 BB10 1.3 BB111.7 BB12 1.9Cycles 2-5

For each of these cycles, the combined solution resulting from theprevious cycle was divided into 12 equal aliquots of 50 ul each andplaced in PCR tubes. To each tube was added a solution comprising adifferent tag, and ligation, purification and acylation were performedas described for Cycle 1, except that for Cycles 3-5, the HPLCpurification step described for Cycle 1 was omitted. The correspondencebetween tags and building block precursors for Cycles 2-5 is presentedin Table 2.

The products of Cycle 5 were ligated with the closing primer shownbelow, using the method described above for ligation of tags.5′-PO₃-GGCACATTGATTTGGGAGTCA GTGTAACTAAACCCTCAGT-PO₃-5′

TABLE 2 Building Block Cycle 2 Cycle 3 Cycle 4 Cycle 5 Precursor Tag TagTag Tag BB1 2.7 3.7 4.7 5.7 BB2 2.8 3.8 4.8 5.8 BB3 2.2 3.2 4.2 5.2 BB42.10 3.10 4.10 5.10 BB5 2.1 3.1 4.1 5.1 BB6 2.12 3.12 4.12 5.12 BB7 2.53.5 4.5 5.5 BB8 2.6 3.6 4.6 5.6 BB9 2.4 3.4 4.4 5.4 BB10 2.3 3.3 4.3 5.3BB11 2.9 3.9 4.9 5.9 BB12 2.11 3.11 4.11 5.11Results:

The synthetic procedure described above has the capability of producinga library comprising 12⁵ (about 249,000) different structures. Thesynthesis of the library was monitored via gel electrophoresis of theproduct of each cycle. The results of each of the five cycles and thefinal library following ligation of the closing primer are illustratedin FIG. 7. The compound labeled “head piece” is Compound 1. The figureshows that each cycle results in the expected molecular weight increaseand that the products of each cycle are substantially homogeneous withregard to molecular weight.

Example 2 Synthesis and Characterization of a Library on the Order of108 Members

The synthesis of a library comprising on the order of 108 distinctmembers was accomplished using the following reagents:Compound 2:

Single letter codes for deoxyribonucleotides:

-   -   A=adenosine    -   C=cytidine    -   G=guanosine    -   T=thymidine

Building block precursors:

TABLE 3 Oligonucleotide tags used in cycle 1: Tag Number Top StrandSequence Bottom Strand Sequence 1.1 5′-PO3- 5′-PO3- AAATCGATGTGGTCACTCAGGAGTGACCACATCGATTTGG (SEQ ID NO:121) (SEQ ID NO:122) 1.2 5′-PO3- 5′-PO3-AAATCGATGTGGACTAGGAG CCTAGTCCACATCGATTTGG (SEQ ID NO:123) (SEQ IDNO:124) 1.3 5′-PO3- 5′-PO3- AAATCGATGTGCCGTATGAG CATACGGCACATCGATTTGG(SEQ ID NO:125) (SEQ ID NO:126) 1.4 5′-PO3- 5′-PO3- AAATCGATGTGCTGAAGGAGCCTTCAGCACATCGATTTGG (SEQ ID NO:127) (SEQ ID NO:128) 1.5 5′-PO3- 5′-PO3-AAATCGATGTGGACTAGCAG GCTAGTCCACATCGATTTGG (SEQ ID NO:129) (SEQ IDNO:130) 1.6 5′-PO3- 5′-PO3- AAATCGATGTGCGCTAAGAG CTTAGCGCACATCGATTTGG(SEQ ID NO:131) (SEQ ID NO:132) 1.7 5′-PO3- 5′-PO3- AAATCGATGTGAGCCGAGAGCTCGGCTCACATCGATTTGG (SEQ ID NO:133) (SEQ ID NO:134) 1.8 5′-PO3- 5′-PO3-AAATCGATGTGCCGTATCAG GATACGGCACATCGATTTGG (SEQ ID NO:135) (SEQ IDNO:136) 1.9 5′-PO3- 5′-PO3- AAATCGATGTGCTGAAGCAG GCTTCAGCACATCGATTTGG(SEQ ID NO:137) (SEQ ID NO:138) 1.10 5′-PO3- 5′-PO3-AAATCGATGTGTGCGAGTAG ACTCGCACACATCGATTTGG (SEQ ID NO:139) (SEQ IDNO:140) 1.11 5′-PO3- 5′-PO3- AAATCGATGTGTTTGGCGAG CGCCAAACACATCGATTTGG(SEQ ID NO:141) (SEQ ID NO:142) 1.12 5′-PO3- 5′-PO3-AAATCGATGTGCGCTAACAG GTTAGCGCACATCGATTTGG (SEQ ID NO:143) (SEQ IDNO:144) 1.13 5′-PO3- 5′-PO3- AAATCGATGTGAGCCGACAG GTCGGCTCACATCGATTTGG(SEQ ID NO:145) (SEQ ID NO:146) 1.14 5′-PO3- 5′-PO3-AAATCGATGTGAGCCGAAAG TTCGGCTCACATCGATTTGG (SEQ ID NO:147) (SEQ IDNO:148) 1.15 5′-PO3- 5′-PO3- AAATCGATGTGTCGGTAGAG CTACCGACACATCGATTTGG(SEQ ID NO:149) (SEQ ID NO:150) 1.16 5′-PO3- 5′-PO3-AAATCGATGTGGTTGCCGAG CGGCAACCACATCGATTTGG (SEQ ID NO:151) (SEQ IDNO:152) 1.17 5′-PO3- 5′-PO3- AAATCGATGTGAGTGCGTAG ACGCAGTCACATCGATTTGG(SEQ ID NO:153) (SEQ ID NO:154) 1.18 5′-PO3- 5′-PO3-AAATCGATGTGGTTGCCAAG TGGCAACCACATCGATTTGG (SEQ ID NO:155) (SEQ IDNO:156) 1.19 5′-PO3- 5′-PO3- AAATCGATGTGTGCGAGGAG CCTCGCACACATCGATTTGG(SEQ ID NO:157) (SEQ ID NO:158) 1.20 5′-PO3- 5′-PO3-AAATCGATGTGGAACACGAG CGTGTTCCACATCGATTTGG (SEQ ID NO:159) (SEQ IDNO:160) 1.21 5′-PO3- 5′-PO3- AAATCGATGTGCTTGTCGAG CGACAAGCACATCGATTTGG(SEQ ID NO:161) (SEQ ID NO:162) 1.22 5′-PO3- 5′-PO3-AAATCGATGTGTTCCGGTAG A0CCGGAACACATCGATTTGG (SEQ ID NO:163) (SEQ IDNO:164) 1.23 5′-PO3- 5′-PO3- AAATCGATGTGTGCGAGCAG GCTCGCACACATCGATTTGG(SEQ ID NO:165) (SEQ ID NO:166) 1.24 5′-PO3- 5′-PO3-AAATCGATGTGGTCAGGTAG ACCTGACCACATCGATTTGG (SEQ ID NO:167) (SEQ IDNO:168) 1.25 5′-PO3- 5′-PO3- AAATCGATGTGGCCTGTTAG AACAGGCCACATCGATTTGG(SEQ ID NO:169) (SEQ ID NO:170) 1.26 5′-PO3- 5′-PO3-AAATCGATGTGGAACACCAG GGTGTTCCACATCGATTTGG (SEQ ID NO:171) (SEQ IDNO:172) 1.27 5 ′-PO3- 5 ′-PO3- AAATCGATGTGCTTGTCCAG GGACAAGCACATCGATTTGG(SEQ ID NO:173) (SEQ ID NO:174) 1.28 5′-PO3- 5′-PO3-AAATCGATGTGTGCGAGAAG TCTCGCACACATCGATTTGG (SEQ ID NO:175) (SEQ IDNO:176) 1.29 5′-PO3- 5′-PO3- AAATCGATGTGAGTGCGGAG CCGCACTCACATCGATTTGG(SEQ ID NO:177) (SEQ ID NO:178) 1.30 5′-PO3- 5′-PO3-AAATCGATGTGTTGTCCGAG CGGACAACACATCGATTTGG (SEQ ID NO:179) (SEQ IDNO:180) 1.31 5′-PO3- 5′-PO3- AAATCGATGTGTGGAACGAG CGTTCCACACATCGATTTGG(SEQ ID NO:181) (SEQ ID NO:182) 1.32 5′-PO3- 5 ′-PO3-AAATCGATGTGAGTGCGAAG TCGCACTCACATCGATTTGG (SEQ ID NO:183) (SEQ IDNO:184) 1.33 5′-PO3- 5′-PO3- AAATCGATGTGTGGAACGAG GGTTCCACACATCGATTTGG(SEQ ID NO:185) (SEQ ID NO:186) 1.34 5′-PO3- 5′-PO3-AAATCGATGTGTTAGGCGAG CGCCTAACACATCGATTTGG (SEQ ID NO:187) (SEQ IDNO:188) 1.35 5′-PO3- 5′-PO3- AAATCGATGTGGCCTGTGAG CACAGGCCACATCGATTTGG(SEQ ID NO:189) (SEQ ID NO:190) 1.36 5′-PO3- 5′-PO3-AAATCGATGTGCTCCTGTAG ACAGGAGCACATCGATTTGG (SEQ ID NO:191) (SEQ IDNO:192) 1.37 5′-PO3- 5′-PO3- AAATCGATGTGGTCAGGCAG GCCTGACCACATCGATTTGG(SEQ ID NO:193) (SEQ ID NO:194) 1.38 5′-PO3- 5′-PO3-AAATCGATGTGGTGAGGAAG TCCTGACCACATCGATTTGG (SEQ ID NO:195) (SEQ IDNO:196) 1.39 5′-PO3- 5′-PO3- AAATCGATGTGGTAGCCGAG CGGCTACCACATCGATTTGG(SEQ ID NO:197) (SEQ ID NO:198) 1.40 5′-PO3- 5′-PO3-AAATCGATGTGGCCTGTAAG TACAGGCCACATCGATTTGG (SEQ ID NO:199) (SEQ IDNO:200) 1.41 5′-PO3- 5′-PO3- AAATCGATGTGCTTTCGGAG CCGAAAGCACATCGATTTGG(SEQ ID NO:201) (SEQ ID NO:202) 1.42 5′-PO3- 5′-PO3-AAATCGATGTGCGTAAGGAG CCTTACGCACATCGATTTGG (SEQ ID NO:203) (SEQ IDNO:204) 1.43 5′-PO3- 5′-PO3- AAATCGATGTGAGAGCGTAG ACGCTCTCACATCGATTTGG(SEQ ID NO:205) (SEQ ID NO:206) 1.44 5′-PO3- 5′-PO3-AAATCGATGTGGACGGCAAG TGCCGTCCACATCGATTTGG (SEQ ID NO:207) (SEQ IDNO:208) 1.45 5′-PO3- 5′-PO3- AAATCGATGTGCTTTCGCAG GCGAAAGCACATCGATTTGG(SEQ ID NO:209) (SEQ ID NO:210) 1.46 5′-PO3- 5′-PO3-AAATCGATGTGCGTAAGCAG GCTTACGCACATCGATTTGG (SEQ ID NO:211) (SEQ IDNO:212) 1.47 5′-PO3- 5′-PO3- AAATCGATGTGGCTATGGAG CCATAGCCACATCGATTTGG(SEQ ID NO:213) (SEQ ID NO:214) 1.48 5′-PO3- 5′-PO3-AAATCGATGTGACTCTGGAG CCAGAGTCACATCGATTTGG (SEQ ID NO:215) (SEQ IDNO:216) 1.49 5′-PO3- 5′-PO3- AAATCGATGTGCTGGAAAG TTCCAGCACATCGATTTGG(SEQ ID NO:217) (SEQ ID NO:218) 1.50 5′-PO3- 5′-PO3-AAATCGATGTGCCGAAGTAG ACTTCGGCACATCGATTTGG (SEQ ID NO:219) (SEQ IDNO:220) 1.51 5′-PO3- 5′-PO3- AAATCGATGTGCTCCTGAAG TCAGGAGCACATCGATTTGG(SEQ ID NO:221) (SEQ ID NO:222) 1.52 5′-PO3- 5′-PO3-AAATCGATGTGTCCAGTCAG GACTGGACACATCGATTTGG (SEQ ID NO:223) (SEQ IDNO:224) 1.53 5′-PO3- 5′-PO3- AAATCGATGTGAGAGCGGAG CCGCTCTCACATCGATTTGG(SEQ ID NO:225) (SEQ ID NO:226) 1.54 5′-PO3- 5′-PO3-AAATCGATGTGAGAGCGAAG TCGCTCTCACATCGATTTGG (SEQ ID NO:227) (SEQ IDNO:228) 1.55 5′-PO3- 5 ′-PO3- AAATCGATGTGCCGAAGCAG GCTTCGGCACATCGATTTGG(SEQ ID NO:229) (SEQ ID NO:230) 1.56 5′-PO3- 5′-PO3-AAATCGATGTGCCGAAGCAG GCTTCGGCACATCGATTTGG (SEQ ID NO:231) (SEQ IDNO:232) 1.57 5′-PO3- 5′-PO3- AAATCGATGTGTGTTCCGAG CGGAACACACATCGATTTGG(SEQ ID NO:233) (SEQ ID NO:234) 1.58 5′-PO3- 5′-PO3-AAATCGATGTGTCTGGCGAG CGCCAGACACATCGATTTGG (SEQ ID NO:235) (SEQ IDNO:236) 1.59 5′-PO3- 5′-PO3- AAATCGATGTGCTATCGGAG CCGATAGCACATCGATTTGG(SEQ ID NO:237) (SEQ ID NO:238) 1.60 5′-PO3- 5′-PO3-AAATCGATGTGCGAAAGGAG CCTTTCGCACATCGATTTGG (SEQ ID NO:239) (SEQ IDNO:240) 1.61 5′-PO3- 5′-PO3- AAATCGATGTGCCGAAGAAG TCTTCGGCACATCGATTTGG(SEQ ID NO:241) (SEQ ID NO:242) 1.62 5′-PO3- 5′-PO3-AAATCGATGTGGTTGCAGAG CTGCAACCACATCGATTTGG (SEQ ID NO:243) (SEQ IDNO:244) 1.63 5′-PO3- 5′-PO3- AAATCGATGTGGATGGTGAG CACCATCCACATCGATTTGG(SEQ ID NO:245) (SEQ ID NO:246) 1.64 5′-PO3- 5′-PO3-AAATCGATGTGCTATCGCAG GCGATAGCACATCGATTTGG (SEQ ID NO:247) (SEQ IDNO:248) 1.65 5′-PO3- 5′-PO3- AAATCGATGTGCGAAAGCAG GCTTTCGCACATCGATTTGG(SEQ ID NO:249) (SEQ ID NO:250) 1.66 5′-PO3- 5′-PO3-AAATCGATGTGACACTGGAG CCAGTGTCACATCGATTTGG (SEQ ID NO:251) (SEQ IDNO:252) 1.67 5′-PO3- 5′-PO3- AAATCGATGTGTCTGGCAAG TGCCAGACACATCGATTTGG(SEQ ID NO:253) (SEQ ID NO:254) 1.68 5′-PO3- 5′-PO3-AAATCGATGTGGATGGTCAG GACCATCCACATCGATTTGG (SEQ ID NO:255) (SEQ IDNO:256) 1.69 5′-PO3- 5′-PO3- AAATCGATGTGGTTGCACAG GTGCAACCACATCGATTTGG(SEQ ID NO:257) (SEQ ID NO:258) 1.70 5′-PO3- 5′-PO3-AAATCGATGTGGGCATCGAG CGATGCCCCATCCGATTTGG (SEQ ID NO:259) (SEQ IDNO:260) 1.71 5′-PO3- 5′-PO3- AAATCGATGTGTGCCTCCAG GGAGGCACACATCGATTTGG(SEQ ID NO:261) (SEQ ID NO:262) 1.72 5′-PO3- 5′-PO3-AAATCGATGTGTGCCTCAAG TGAGGCACACATCGATTTGG (SEQ ID NO:263) (SEQ IDNO:264) 1.73 5′-PO3- 5′-PO3- AAATCGATGTGGGCATCCAG GGATGCCCACATCGATTTGG(SEQ ID NO:265) (SEQ ID NO:266) 1.74 5′-PO3- 5′-PO3-AAATCGATGTGGGCATCAAG TGATGCCCA CAT CGA TTT GG (SEQ ID NO:267) (SEQ IDNO:268) 1.75 5′-PO3- 5′-PO3-CGA GAG GCA CAT AAATCGATGTGCCTGTCGAG CGA CAGGCA CAT CGA TTT GG (SEQ ID NO:269) (SEQ ID NO:270) 1.76 5′-PO3-5′-PO3-ATC CGT CCA CAT AAATCGATGTGGACGGATAG ATC CGT CCA CAT CGA TTT GG(SEQ ID NO:271) (SEQ ID NO:272) 1.77 5′-PO3- 5′-PO3-AAATCGATGTGCCTGTCCAG GGA CAG GCA CAT CGA TTT GG (SEQ ID NO:273) (SEQ IDNO:274) 1.78 5′-PO3- 5′-PO3- AAATCGATGTGAAGCACGAG CGT GCT TCA CAT CGATTT GG (SEQ ID NO:275) (SEQ ID NO:276) 1.79 5′-PO3- 5′-PO3-AAATCGATGTGCCTGTCAAG TGA CAG GCA CAT CGA TTT GG (SEQ ID NO:277) (SEQ IDNO:278) 1.80 5′-PO3- 5′-PO3-GGT GCT TCA CAT AAATCGATGTGAAGCACCAG GGT GCTTCA CAT CGA TTT GG (SEQ ID NO:279) (SEQ ID NO:280) 1.81 5′-PO3- 5′-PO3-AAATCGATGTGCCTTCGTAG ACG AAG GCA CAT CGA TTT GG (SEQ ID NO:281) (SEQ IDNO:282) 1.82 5′-PO3- 5′-PO3- AAATCGATGTGTCGTCCGAG CGG ACG ACA CAT CGATTT GG (SEQ ID NO:283) (SEQ ID NO:284) 1.83 5′-PO3- 5′-PO3-AAATCGATGTGGAGTCTGAG CAG ACT CCA CAT CGA TTT GG (SEQ ID NO:285) (SEQ IDNO:286) 1.84 5′-PO3- 5′-PO3- AAATCGATGTGTGATCCGAG CGG ATC ACA CAT CGATTT GG (SEQ ID NO:287) (SEQ ID NO:288) 1.85 5′-PO3- 5′-PO3-AAATCGATGTGTCAGGCGAG CGC CTG ACA CAT CGA TTT GG (SEQ ID NO:289) (SEQ IDNO:290) 1.86 5′-PO3- 5′-PO3- AAATCGATGTGTCGTCCAAG TGG ACG ACA CAT CGATTT GG (SEQ ID NO:291) (SEQ ID NO:292) 1.87 5′-PO3- 5′-PO3-AAATCGATGTGGACGGAGAG CTC CGT CCA CAT CGA TTT GG (SEQ ID NO:293) (SEQ IDNO:294) 1.88 5′-PO3- 5′-PO3- AAATCGATGTGGTAGCAGAG CTG CTA CCA CAT CGATTT GG (SEQ ID NO:295) (SEQ ID NO:296) 1.89 5′-PO3- 5′-PO3-AAATCGATGTGGCTGTGTAG ACACAGCCACATCGATTTGG (SEQ ID NO:297) (SEQ IDNO:298) 1.90 5′-PO3- 5′-PO3- AAATCGATGTGGACGGACAG GTC CGT CCA CAT CGATTT GG (SEQ ID NO:299) (SEQ ID NO:300) 1.91 5′-PO3- 5′-PO3-AAATCGATGTGTCAGGCAAG TGC CTG ACA CAT CGA TTT GG (SEQ ID NO:301) (SEQ IDNO:302) 1.92 5′-PO3- 5′-PO3- AAATCGATGTGGCTCGAAAG TTCGAGCCACATCGATTGG(SEQ ID NO:303) (SEQ ID NO:304) 1.93 5′-PO3- 5′-PO3-AAATCGATGTGCCTTCGGAG CCG AAG GCA CAT CGA TTT GG (SEQ ID NO:305) (SEQ IDNO:306) 1.94 5′-PO3- 5′-PO3- AAATCGATGTGGTAGCACAG GTG CTA CCA CAT CGATTT GG (SEQ ID NO:307) (SEQ ID NO:308) 1.95 5′-PO3- 5′-PO3-AAATCGATGTGGAAGGTCAG GAC CTT CCA CAT CGA TTT GG (SEQ ID NO:309) (SEQ IDNO:310) 1.96 5′-PO3- 5′-PO3- AAATCGATGTGGTGCTGTAG ACA GCA CCA CAT CGATTT GG (SEQ ID NO:311) (SEQ ID NO:312)

TABLE 4 Oligonucleotide tags used in cycle 2: Tag Number Top strandsequence Bottom strand sequence 2.1 5′-PO3-GTT GCC TGT 5′-PO3-AGG CAACCT (SEQ ID NO:313) (SEQ ID NO:314) 2.2 5′-PO3-CAG GAC GGT 5′-PO3-CGTCCT GCT (SEQ ID NO:315) (SEQ ID NO:316) 2.3 5′-PO3-AGA CGT GGT5′-PO3-CAC GTC TCT (SEQ ID NO:317) (SEQ ID NO:318) 2.4 5′-PO3-CAG GACCGT 5′-PO3-GGT CCT GCT (SEQ ID NO:319) (SEQ ID NO:320) 2.5 5′-PO3-CAGGAC AGT 5′-PO3-TGT CCT GCT (SEQ ID NO:321) (SEQ ID NO:322) 2.65′-PO3-CAC TCT GGT 5′-PO3-CAG AGT GCT (SEQ ID NO:323) (SEQ ID NO:324)2.7 5′-PO3-GAC GGC TGT 5′-PO3-AGC CGT CCT (SEQ ID NO:325) (SEQ IDNO:326) 2.8 5′-PO3-CAC TCT CGT 5′-PO3-GAG AGT GCT (SEQ ID NO:327) (SEQID NO:328) 2.9 5′-PO3-GTA GCC TGT 5′-PO3-AGG CTA CCT (SEQ ID NO:329)(SEQ ID NO:330) 2.10 5′-PO3-GCC ACT TGT 5′-PO3-AAG TGG CCT (SEQ IDNO:331) (SEQ ID NO:332) 2.11 5′-PO3-CAT CGC TGT 5′-PO3-AGC GAT GCT (SEQID NO:333) (SEQ ID NO:334) 2.12 5′-PO3-CAC TGG TGT 5′-PO3-ACC AGT GCT(SEQ ID NO:335) (SEQ ID NO:336) 2.13 5′-PO3-GCC ACT GGT 5′-PO3-CAG TGGCCT (SEQ ID NO:337) (SEQ ID NO:338) 2.14 5′-PO3-TCT GGC TGT 5′-PO3-AGCCAG ACT (SEQ ID NO:339) (SEQ ID NO:340) 2.15 5′-PO3-GCC ACT CGT5′-PO3-GAG TGG CCT (SEQ ID NO:341) (SEQ ID NO:342) 2.16 5′-PO3-TGC CTCTGT 5′-PO3-AGA GGC ACT (SEQ ID NO:343) (SEQ ID NO:344) 2.17 5′-PO3-CATCGC AGT 5′-PO3-TGC GAT GCT (SEQ ID NO:345) (SEQ ID NO:346) 2.185′-PO3-CAG GAA GGT 5′-PO3-CTT CCT GCT (SEQ ID NO:347) (SEQ ID NO:348)2.19 5′-PO3-GGC ATC TGT 5′-PO3-AGA TGC CCT (SEQ ID NO:349) (SEQ IDNO:350) 2.20 5′-PO3-CGG TGG TGT 5′-PO3-AGC ACC GCT (SEQ ID NO:351) (SEQID NO:352) 2.21 5′-PO3-CAC TGG CGT 5′-PO3-GCC AGT GCT (SEQ ID NO:353)(SEQ ID NO:354) 2.22 5′-PO3-TCTCCTCGT 5′-PO3-GAGGAGACT (SEQ ID NO:355)(SEQ ID NO:356) 2.23 5′-PO3-CCT GTC TGT 5′-PO3-AGA CAG GCT (SEQ IDNO:357) (SEQ ID NO:358) 2.24 5′-PO3-CAA CGC TGT 5′-PO3-AGC GTT GCT (SEQID NO:359) (SEQ ID NO:360) 2.25 5′-PO3-TGC CTC GGT 5′-PO3-CGA GGC ACT(SEQ ID NO:361) (SEQ ID NO:362) 2.26 5′-PO3-ACA CTG CGT 5′-PO3-GCA GTGTCT (SEQ ID NO:363) (SEQ ID NO:364) 2.27 5′-PO3-TCG TCC TGT 5′-PO3-AGGACG ACT (SEQ ID NO:365) (SEQ ID NO:366) 2.28 5′-PO3-GCT GCC AGT5′-PO3-TGG CAG CCT (SEQ ID NO:367) (SEQ ID NO:368) 2.29 5′-PO3-TCA GCCTGT 5′-PO3-AGC CTG ACT (SEQ ID NO:369) (SEQ ID NO:370) 2.30 5′-PO3-GCCAGG TGT 5′-PO3-ACC TGG CCT (SEQ ID NO:371) (SEQ ID NO:372) 2.315′-PO3-CGG ACC TGT 5′-PO3-AGG TCC GCT (SEQ ID NO:373) (SEQ ID NO:374)2.32 5′-PO3-CAA CGC AGT 5′-PO3-TGC GTT GCT (SEQ ID NO:375) (SEQ IDNO:376) 2.33 5′-PO3-CAC ACG AGT 5′-PO3-TCG TGT GCT (SEQ ID NO:377) (SEQID NO:378) 2.34 5′-PO3-ATG GCC TGT 5′-PO3-AGG CCA TCT (SEQ ID NO: 379)(SEQ ID NO:380) 2.35 5′-PO3-CCA GTC TGT 5′-PO3-AGA CTG GCT (SEQ IDNO:381) (SEQ ID NO:382) 2.36 5′-PO3-GCC AGG AGT 5′-PO3-TCC TGG CCT (SEQID NO:383) (SEQ ID NO:384) 2.37 5′-PO3-CGG ACC AGT 5′-PO3-TGG TCC GCT(SEQ ID NO:385) (SEQ ID NO:386) 2.38 5′-PO3-CCT TCG CGT 5′-PO3-GCG AAGGCT (SEQ ID NO:387) (SEQ ID NO:388) 2.39 5′-PO3-GCA GCC AGT 5′-PO3-TGGCTG CCT (SEQ ID NO:389) (SEQ ID NO:390) 2.40 5′-PO3-CCA GTC GGT5′-PO3-CGA CTG GCT (SEQ ID NO:391) (SEQ ID NO:392) 2.41 5′-PO3-ACT GAGCGT 5′-PO3-GCT GAG TCT (SEQ ID NO:393) (SEQ ID NO:394) 2.42 5′-PO3-CCAGTC CGT 5′-PO3-GGA CTG GCT (SEQ ID NO:395) (SEQ ID NO:396) 2.435′-PO3-CCA GTC AGT 5′-PO3-TGA CTG GCT (SEQ ID NO:397) (SEQ ID NO:398)2.44 5′-PO3-CAT CGA GGT 5′-PO3-CTC GAT GCT (SEQ ID NO:399) (SEQ IDNO:400) 2.45 5′-PO3-CCA TCG TGT 5′-PO3-ACG ATG GCT (SEQ ID NO:401) (SEQID NO:402) 2.46 5′-PO3-GTG CTG CGT 5′-PO3-GCA GCA CCT (SEQ ID NO:403)(SEQ ID NO:404) 2.47 5′-PO3-GAC TAC GGT 5′-PO3-CGT AGT CCT (SEQ IDNO:405) (SEQ ID NO:406) 2.48 5′-PO3-GTG CTG AGT 5′-PO3-TCA GCA CCT (SEQID NO:407) (SEQ ID NO:408) 2.49 5′-PO3-GCTGCATGT 5′-PO3-ATGGAGCCT (SEQID NO:409) (SEQ ID NO:410) 2.50 5′-PO3-GAGTGGTGT 5′-PO3-ACCACTCCT (SEQID NO:411) (SEQ ID NO:412) 2.51 5′-PO3-GACTACCGT 5′-PO3-GGTAGTCCT (SEQID NO:413) (SEQ ID NO:414) 2.52 5′-PO3-CGGTGATGT 5′-PO3-ATCACCGCT (SEQID NO:415) (SEQ ID NO:416) 2.53 5′-PO3-TGCGACTGT 5′-PO3-AGTCGCACT (SEQID NO:417) (SEQ ID NO:418) 2.54 5′-PO3-TCTGGAGGT 5′-PO3-CTCCAGACT (SEQID NO:419) (SEQ ID NO:420) 2.55 5′-PO3-AGCACTGGT 5′-PO3-CAGTGCTCT (SEQID NO:421) (SEQ ID NO:422) 2.56 5′-PO3-TCGCTTGGT 5′-PO3-CAAGCGACT (SEQID NO:423) (SEQ ID NO:424) 2.57 5′-PO3-AGCACTCGT 5′-PO3-GAGTGCTCT (SEQID NO:425) (SEQ ID NO:426) 2.58 5′-PO3-GCGATTGGT 5′-PO3-CAATCGCCT (SEQID NO:427) (SEQ ID NO:428) 2.59 5′-PO3-CCATCGCGT 5′-PO3-GCGATGGCT (SEQID NO:429) (SEQ ID NO:430) 2.60 5′-PO3-TCGCTTCGT 5′-PO3-GAAGCGACT (SEQID NO:431) (SEQ ID NO:432) 2.61 5′-PO3-AGTGCCTGT 5′-PO3-AGGCACTGT (SEQID NO:433) (SEQ ID NO:434) 2.62 5′-PO3-GGCATAGGT 5′-PO3-CTATGCCCT (SEQID NO:435) (SEQ ID NO:436) 2.63 5′-PO3-GCGATTCGT 5′-PO3-GAATCGCCT (SEQID NO:437) (SEQ ID NO:438) 2.64 5′-PO3-TGCGACGGT 5′-PO3-CGTCGCACT (SEQID NO:439) (SEQ ID NO:440) 2.65 5′-PO3-GAGTGGCGT 5′-PO3-GCCACTCCT (SEQID NO:441) (SEQ ID NO:442) 2.66 5′-PO3-CGGTGAGGT 5′-PO3-CTCACCGCT (SEQID NO:443) (SEQ ID NO:444) 2.67 5′-PO3-GCTGCAAGT 5′-PO3-TTGCAGCCT (SEQID NO:445) (SEQ ID NO:446) 2.68 5′-PO3-TTCCGCTGT 5′-PO3-AGCGGAACT (SEQID NO:447) (SEQ ID NO:448) 2.69 5′-PO3-GAGTGGAGT 5′-PO3-TCCACTCCT (SEQID NO:449) (SEQ ID NO:450) 2.70 5′-PO3-ACAGAGCGT 5′-PO3-GCTCTGTCT (SEQID NO:451) (SEQ ID NO:452) 2.71 5′-PO3-TGCGACCGT 5′-PO3-GGTCGCACT (SEQID NO:453) (SEQ ID NO:454) 2.72 5′-PO3-CCTGTAGGT 5′-PO3-CTACAGGGT (SEQID NO:455) (SEQ ID NO:456) 2.73 5′-PO3-TAGCCGTGT 5′-PO3-ACGGCTACT (SEQID NO:457) (SEQ ID NO:458) 2.74 5′-PO3-TGCGACAGT 5′-PO3-TGTCGCAGT (SEQID NO:459) (SEQ ID NO:460) 2.75 5′-PO3-GGTCTGTGT 5′-PO3-ACAGACCCT (SEQID NO:461) (SEQ ID NO:462) 2.76 5′-PO3-CGGTGAAGT 5′-PO3-TTCACCGCT (SEQID NO:463) (SEQ ID NO:464) 2.77 5′-PO3-CAACGAGGT 5′-PO3-CTCGTTGCT (SEQID NO:465) (SEQ ID NO:466) 2.78 5′-PO3-GGAGCATGT 5′-PO3-ATGCTGCCT (SEQID NO:467) (SEQ ID NO:468) 2.79 5′-PO3 -TCGTCAGGT 5′-PO3-CTGACGACT (SEQID NO:469) (SEQ ID NO:470) 2.80 5′-PO3-AGTGCCAGT 5′-PO3-TGGCACTCT (SEQID NO:471) (SEQ ID NO:472) 2.81 5′-PO3-TAGAGGCGT 5′-PO3-GCCTCTACT (SEQID NO:473) (SEQ ID NO:474) 2.82 5′-PO3-GTCAGCGGT 5′-PO3-CGCTGACCT (SEQID NO:475) (SEQ ID NO:476) 2.83 5′-PO3-TCAGGAGGT 5′-PO3-CTCCTGACT (SEQID NO:477) (SEQ ID NO:478) 2.84 5′-PO3-AGCAGGTGT 5′-PO3-ACCTGCTCT (SEQID NO:479) (SEQ ID NO:480) 2.85 5′-PO3-TTCCGCAGT 5′-PO3-TGCGGAACT (SEQID NO:481) (SEQ ID NO:482) 2.86 5′-PO3-GTCAGCCGT 5′-PO3-GGCTGACCT (SEQID NO:483) (SEQ ID NO:484) 2.87 5′-PO3-GGTCTGCGT 5′-PO3-GCAGACCCT (SEQID NO:485) (SEQ ID NO:486) 2.88 5′-PO3-TAGCCGAGT 5′-PO3-TCGGCTACT (SEQID NO:487) (SEQ ID NO:488) 2.89 5′-PO3-GTCAGCAGT 5′-PO3-TGCTGACCT (SEQID NO:489) (SEQ ID NQ:490) 2.90 5′-PO3-GGTCTGAGT 5′-PO3-TCAGACCCT (SEQID NO:491) (SEQ ID NO:492) 2.91 5′-PO3-CGGACAGGT 5′-PO3-CTGTCCGCT (SEQID NO:493) (SEQ ID NO:494) 2.92 5′-PO3-TTAGCCGGT 5′-PO3-CGGCTAACT5′-PO3-3′ 5′-PO3-3′ (SEQ ID NO:495) (SEQ ID NO:496) 2.935′-PO3-GAGACGAGT 5′-PO3-TCGTCTCCT (SEQ ID NO:497) (SEQ ID NQ:498) 2.945′-PO3-CGTAACCGT 5′-PO3-GGTTACGCT (SEQ ID NO:499) (SEQ ID NO:500) 2.955′-PO3-TTGGCGTGT 5′-PO3-ACGCCAACT 5′-PO3-3′ 5′-PO3-3′ (SEQ ID NO:501)(SEQ ID NO:502) 2.96 5′-PO3-ATGGCAGGT 5′-PO3-CTGCCATCT (SEQ ID NO:503)(SEQ ID NO:504)

TABLE 5 Oligonucleotide tags used in cycle 3 Tag number Top strandsequence Bottom strand sequence 3.1 5′-PO3-CAG CTA CGA 5′-PO3-GTA GCTGAC (SEQ ID NO:505) (SEQ ID NO:506) 3.2 5′-PO3-CTC CTG CGA 5′-PO3-GCAGGA GAC (SEQ ID NO:507) (SEQ ID NO:508) 3.3 5′-PO3-GCT GCC TGA5′-PO3-AGG CAG CAC (SEQ ID NO:509) (SEQ ID NO:510) 3.4 5′-PO3-CAG GAACGA 5′-PO3-GTT CCT GAC (SEQ ID NO:511) (SEQ ID NO:512) 3.5 5′-PO3-CACACG CGA 5′-PO3-GCG TGT GAC (SEQ ID NO:513) (SEQ ID NO:514) 3.65′-PO3-GCA GCC TGA 5′-PO3-AGG CTG CAC (SEQ ID NO:515) (SEQ ID NO:516)3.7 5′-PO3-CTG AAC GGA 5′-PO3-CGT TCA GAC (SEQ ID NO:517) (SEQ IDNO:518) 3.8 5′-PO3-CTG AAC CGA 5′-PO3-GGT TCA GAC (SEQ ID NO:519) (SEQID NO:520) 3.9 5′-PO3-TCT GGA CGA 5′-PO3-GTC CAG AAC (SEQ ID NO:521)(SEQ ID NO:522) 3.10 5′-PO3-TGC CTA CGA 5′-PO3-GTA GGC AAC (SEQ IDNO:523) (SEQ ID NO:524) 3.11 5′-PO3-GGC ATA CGA 5′-PO3-GTA TGC CAC (SEQID NO:525) (SEQ ID NO:526) 3.12 5′-PO3-CGG TGA CGA 5′-PO3-GTC ACC GAC(SEQ ID NO:527) (SEQ ID NO:528) 3.13 5′-PO3-CAA CGA CGA 5′-PO3-GTC GTTGAC (SEQ ID NO:529) (SEQ ID NO:530) 3.14 5′-PO3-CTC CTC TGA 5′-PO3-AGAGGA GAC (SEQ ID NO:531) (SEQ ID NO:532) 3.15 5′-PO3-TCA GGA CGA5′-PO3-GTC CTG AAC (SEQ ID NO:533) (SEQ ID NO:534) 3.16 5′-PO3-AAA GGCGGA 5′-PO3-GGC CTT TAC (SEQ ID NO:535) (SEQ ID NO:536) 3.17 5′-PO3-CTCCTC GGA 5′-PO3-CGA GGA GAC (SEQ ID NO:537) (SEQ ID NO:538) 3.185′-PO3-CAG ATG CGA 5′-PO3-GCA TCT GAC (SEQ ID NO:539) (SEQ ID NO:540)3.19 5′-PO3-GCA GCA AGA 5′-PO3-TTG CTG CAC (SEQ ID NO:541) (SEQ IDNO:542) 3.20 5′-PO3-GTG GAG TGA 5′-PO3-ACT CCA CAC (SEQ ID NO:543) (SEQID NO:544) 3.21 5′-PO3-CCA GTA GGA 5′-PO3-CTA CTG GAC (SEQ ID NO:545)(SEQ ID NO:546) 3.22 5′-PO3-ATG GCA CGA 5′-PO3-GTG CCA TAC (SEQ IDNO:547) (SEQ ID NO:548) 3.23 5′-PO3-GGA CTG TGA 5′-PO3-ACA GTC CAC (SEQID NO:549) (SEQ ID NO:550) 3.24 5′-PO3-CCG AAC TGA 5′-PO3-AGT TCG GAC(SEQ ID NO:551) (SEQ ID NO:552) 3.25 5′-PO3-CTC CTC AGA 5′-PO3-TGA GGAGAC (SEQ ID NO:553) (SEQ ID NO:554) 3.26 5′-PO3-CAC TGC TGA 5′-PO3-AGCAGT GAC (SEQ ID NO:555) (SEQ ID NO:556) 3.27 5′-PO3-AGC AGG CGA5′-PO3-GCC TGC TAC (SEQ ID NO:557) (SEQ ID NO:558) 3.28 5′-PO3-AGC AGGAGA 5′-PO3-TCC TGC TAC (SEQ ID NO:559) (SEQ ID NO:560) 3.29 5′-PO3-AGAGCC AGA 5′-PO3-TGG CTC TAC (SEQ ID NO:561) (SEQ ID NO:562) 3.305′-PO3-GTC GTT GGA 5′-PO3-CAA CGA CAC (SEQ ID NO:563) (SEQ ID NO:564)3.31 5′-PO3-CCG AAC GGA 5′-PO3-CGT TCG GAC (SEQ ID NO:565) (SEQ IDNO:566) 3.32 5′-PO3-CAC TGC GGA 5′-PO3-CGC AGT GAC (SEQ ID NO:567) (SEQID NO:568) 3.33 5′-PO3-GTG GAG CGA 5′-PO3-GCT CCA CAC (SEQ ID NO:569)(SEQ ID NO:570) 3.34 5′-PO3-GTG GAG AGA 5′-PO3-TCT CCA CAC (SEQ IDNO:571) (SEQ ID NO:572) 3.35 5′-PO3-GGA CTG CGA 5′-PO3-GCA GTC CAC (SEQID NO:573) (SEQ ID NO:574) 3.36 5′-PO3-CCG AAC CGA 5′-PO3-GGT TCG GAC(SEQ ID NO:575) (SEQ ID NO:576) 3.37 5′-PO3-CAC TGC CGA 5′-PO3-GGC AGTGAC (SEQ ID NO:577) (SEQ ID NO:578) 3.38 5′-PO3-CGA AAC GGA 5′-PO3-CGTTTC GAC (SEQ ID NO:579) (SEQ ID NO:580) 3.39 5′-PO3-GGA CTG AGA5′-PO3-TCA GTC CAC (SEQ ID NO:581) (SEQ ID NO:582) 3.40 5′-PO3-CCG AACAGA 5′-PO3-TGT TCG GAC (SEQ ID NO:583) (SEQ ID NO:584) 3.41 5′-PO3-CGAAAC CGA 5′-PO3-GGT TTC GAC (SEQ ID NO:585) (SEQ ID NO:586) 3.425′-PO3-CTG GCT TGA 5′-PO3-AAG CCA GAC (SEQ ID NO:587) (SEQ ID NO:588)3.43 5′-PO3-CAC ACC TGA 5′-PO3-AGG TGT GAC (SEQ ID NO:589) (SEQ IDNO:590) 3.44 5′-PO3-AAC GAC CGA 5′-PO3-GGT CGT TAC (SEQ ID NO:591) (SEQID NO:592) 3.45 5′-PO3-ATC CAG CGA 5′-PO3-GCT GGA TAC (SEQ ID NO:593)(SEQ ID NO:594) 3.46 5′-PO3-TGC GAA GGA 5′-PO3-CTT CGC AAC (SEQ IDNO:595) (SEQ ID NO:596) 3.47 5′-PO3-TGC GAA CGA 5′-PO3-GTT CGC AAC (SEQID NO:597) (SEQ ID NO:598) 3.48 5′-PO3-CTG GCT GGA 5′-PO3-CAG CCA GAC(SEQ ID NO:599) (SEQ ID NO:600) 3.49 5′-PO3-CAC ACC GGA 5′-PO3-CGG TGTGAC (SEQ ID NO:601) (SEQ ID NO:602) 3.50 5′-PO3-AGT GCA GGA 5′-PO3-CTGCAC TAC (SEQ ID NO:603) (SEQ ID NO:604) 3.51 5′-PO3-GAC CGT TGA5′-PO3-AAC GGT CAC (SEQ ID NO:605) (SEQ ID NO:606) 3.52 5′-PO3-GGT GAGTGA 5′-PO3-ACT CAC CAC (SEQ ID NO:607) (SEQ ID NO:608) 3.53 5′-PO3-CCTTCC TGA 5′-PO3-AGG AAG GAC (SEQ ID NO:609) (SEQ ID NO:610) 3.545′-PO3-CTG GCT AGA 5′-PO3-TAG CCA GAC (SEQ ID NO:611) (SEQ ID NO:612)3.55 5′-PO3-CAC ACC AGA 5′-PO3-TGG TGT GAC (SEQ ID NO:613) (SEQ IDNO:614) 3.56 5′-PO3-AGC GGT AGA 5′-PO3-TAC CGC TAC (SEQ ID NO:615) (SEQID NO:616) 3.57 5′-PO3-GTC AGA GGA 5′-PO3-CTC TGA CAC (SEQ ID NO:617)(SEQ ID NO:618) 3.58 5′-PO3-TTC CGA CGA 5′-PO3-GTC GGA AAC (SEQ IDNO:619) (SEQ ID NO:620) 3.59 5′-PO3-AGG CGT AGA 5′-PO3-TAC GCC TAC (SEQID NO:621) (SEQ ID NO:622) 3.60 5′-PO3-CTC GAC TGA 5′-PO3-AGT CGA GAC(SEQ ID NO:623) (SEQ ID NO:624) 3.61 5′-PO3-TAC GCT GGA 5′-PO3-CAG CGTAAC (SEQ ID NO:625) (SEQ ID NO:626) 3.62 5′-PO3-GTT CGG TGA 5′-PO3-ACCGAA CAC (SEQ ID NO:627) (SEQ ID NO:628) 3.63 5′-PO3-GCC AGC AGA5′-PO3-TGC TGG CAC (SEQ ID NO:629) (SEQ ID NO:630) 3.64 5′-PO3-GAC CGTAGA 5′-PO3-TAC GGT CAC (SEQ ID NO:631) (SEQ ID NO:632) 3.65 5′-PO3-GTGCTC TGA 5′-PO3-AGA GCA CAC (SEQ ID NO:633) (SEQ ID NO:634) 3.665′-PO3-GGT GAG CGA 5′-PO3-GCT CAC CAC (SEQ ID NO:635) (SEQ ID NO:636)3.67 5′-PO3-GGT GAG AGA 5′-PO3-TCT CAC CAC (SEQ ID NO:637) (SEQ IDNO:638) 3.68 5′-PO3-CCT TCC AGA 5′-PO3-TGG AAG GAC (SEQ ID NO:639) (SEQID NO:640) 3.69 5′-PO3-CTC CTA CGA 5′-PO3-GTA GGA GAC (SEQ ID NO:641)(SEQ ID NO:642) 3.70 5′-PO3-CTC GAC GGA 5′-PO3-CGT CGA GAC (SEQ IDNO:643) (SEQ ID NO:644) 3.71 5′-PO3-GCC GTT TGA 5′-PO3-AAA CGG CAC (SEQID NO:645) (SEQ ID NO:646) 3.72 5′-PO3-GCG GAG TGA 5′-PO3-ACT CCG CAC(SEQ ID NO:647) (SEQ ID NO:648) 3.73 5′-PO3-CGT GCT TGA 5′-PO3-AAG CACGAC (SEQ ID NO:649) (SEQ ID NO:650) 3.74 5′-PO3-CTC GAC CGA 5′-PO3-GGTCGA GAC (SEQ ID NO:651) (SEQ ID NO:652) 3.75 5′-PO3-AGA GCA GGA5′-PO3-CTG GTC TAC (SEQ ID NO:653) (SEQ ID NO:654) 3.76 5′-PO3-GTG CTCGGA 5′-PO3-CGA GCA CAC (SEQ ID NO:655) (SEQ ID NO:656) 3.77 5′-PO3-CTCGAC AGA 5′-PO3-TGT CGA GAC (SEQ ID NO:657) (SEQ ID NO:658) 3.785′-PO3-GGA GAG TGA 5′-PO3-ACT CTC CAC (SEQ ID NO:659) (SEQ ID NO:660)3.79 5′-PO3-AGG CTG TGA 5′-PO3-ACA GCC TAC (SEQ ID NO:661) (SEQ IDNO:662) 3.80 5′-PO3-AGA GCA CGA 5′-PO3-GTG CTC TAC (SEQ ID NO:663) (SEQID NO:664) 3.81 5′-PO3-CCA TCC TGA 5′-PO3-AGG ATG GAC (SEQ ID NO:665)(SEQ ID NO:666) 3.82 5′-PO3-GTT CGG AGA 5′-PO3-TCC GAA CAC (SEQ IDNO:667) (SEQ ID NO:668) 3.83 5′-PO3-TGG TAG CGA 5′-PO3-GCT ACC AAC (SEQID NO:669) (SEQ ID NO:670) 3.84 5′-PO3-GTG CTC CGA 5′-PO3-GGA GCA CAC(SEQ ID NO:671) (SEQ ID NO:672) 3.85 5′-PO3-GTG CTC AGA 5′-PO3-TGA GCACAC (SEQ ID NO:673) (SEQ ID NO:674) 3.86 5′-PO3-GCC GTT GGA 5′-PO3-CAACGG CAC (SEQ ID NO:675) (SEQ ID NO:676) 3.87 5′-PO3-GAG TGC TGA5′-PO3-AGC ACT CAC 7 (SEQ ID NO:677) (SEQ ID NO:678) 3.88 5′-PO3-GCT CCTTGA 5′-PO3-AAG GAG CAC (SEQ ID NO:679) (SEQ ID NO:680) 3.89 5′-PO3-CCGAAA GGA 5′-PO3-CTT TCG GAC (SEQ ID NO:681) (SEQ ID NO:682) 3.905′-PO3-CAC TGA GGA 5′-PO3-CTC AGT GAC (SEQ ID NO:683) (SEQ ID NO:684)3.91 5′-PO3-CGT GCT GGA 5′-PO3-CAG CAC GAC (SEQ ID NO:685) (SEQ IDNO:686) 3.92 5′-PO3-CCG AAA CGA 5′-PO3-GTT TCG GAC (SEQ ID NO:687) (SEQID NO:688) 3.93 5′-PO3-GCG GAG AGA 5′-PO3-TCT CCG CAC (SEQ ID NO:689)(SEQ ID NO:690) 3.94 5′-PO3-GCC GTT AGA 5′-PO3-TAA CGG CAC (SEQ IDNO:691) (SEQ ID NO:692) 3.95 5′-PO3-TCT CGT GGA 5′-PO3-CAC GAG AAC (SEQID NO:693) (SEQ ID NO:694) 3.96 5′-PO3-CGT GCT AGA 5′-PO3-TAG CAC GAC(SEQ ID NO:695) (SEQ ID NO:696)

TABLE 6 Oligonucleotide tags used in cycle 4 Tag number Top strandsequence Bottom strand sequence 4.1 5′-PO3-GCCTGTCTT 5′-PO3-GAC AGG CTC(SEQ ID NO:697) (SEQ ID NO:698) 4.2 5′-PO3-CTCCTGGTT 5′-PO3-CCA GGA GTC(SEQ ID NO:699) (SEQ ID NO:700) 4.3 5′-PO3-ACTCTGCTT 5′-PO3-GCA GAG TTC(SEQ ID NO:701) (SEQ ID NO:702) 4.4 5′-PO3-CATCGCCTT 5′-PO3-GGC GAT GTC(SEQ ID NO:703) (SEQ ID NO:704) 4.5 5′-PO3-GCCACTATT 5′-PO3-TAG TGG CTC(SEQ ID NO:705) (SEQ ID NO:706) 4.6 5′-PO3-CACACGGTT 5′-PO3-CCG TGT GTC(SEQ ID NO:707) (SEQ ID NO:708) 4.7 5′-PO3-CAACGCCTT 5′-PO3-GGC GTT GTC(SEQ ID NO:709) (SEQ ID NO:710) 4.8 5′-PO3-ACTGAGGTT 5′-PO3-CCT CAG TTC(SEQ ID NO:711) (SEQ ID NO:712) 4.9 5′-PO3-GTGCTGGTT 5′-PO3-CCA GCA CTC(SEQ ID NO:713) (SEQ ID NO:714) 4.10 5′-PO3-CATCGACTT 5′-PO3-GTC GAT GTC(SEQ ID NO:715) (SEQ ID NO:716) 4.11 5′-PO3-CCATCGGTT 5′-PO3-CCG ATG GTC(SEQ ID NO:717) (SEQ ID NO:718) 4.12 5′-PO3-GCTGCACTT 5′-PO3-GTG CAG CTC(SEQ ID NO:719) (SEQ ID NO:720) 4.13 5′-PO3-ACAGAGGTT 5′-PO3-CCT CTG TTC(SEQ ID NO:721) (SEQ ID NO:722) 4.14 5′-PO3-AGTGCCGTT 5′-PO3-CGG CAC TTC(SEQ ID NO:723) (SEQ ID NO:724) 4.15 5′-PO3-CGGACATTT 5′-PO3-ATG TCC GTC(SEQ ID NO:725) (SEQ ID NO:726) 4.16 5′-PO3-GGTCTGGTT 5′-PO3-CCA GAC CTC(SEQ ID NO:727) (SEQ ID NO:728) 4.17 5′-PO3-GAGACGGTT 5′-PO3-CCG TCT CTC(SEQ ID NO:729) (SEQ ID NO:730) 4.18 5′-PO3-CTTTCCGTT 5′-PO3-CGG AAA GTC(SEQ ID NO:731) (SEQ ID NO:732) 4.19 5′-PO3-CAGATGGTT 5′-PO3-CCA TCT GTC(SEQ ID NO:733) (SEQ ID NO:734) 4.20 5′-PO3-CGGACACTT 5′-PO3-GTG TCC GTC(SEQ ID NO:735) (SEQ ID NO:736) 4.21 5′-PO3-ACTCTCGTT 5′-PO3-CGA GAG TTC(SEQ ID NO:737) (SEQ ID NO:738) 4.22 5′-PO3-GCAGCACTT 5′-PO3-GTG CTG CTC(SEQ ID NO:739) (SEQ ID NO:740) 4.23 5′-PO3-ACTCTCCTT 5′-PO3-GGA GAG TTC(SEQ ID NO:741) (SEQ ID NO:742) 4.24 5′-PO3-ACCTTGGTT 5′-PO3-CCA AGG TTC(SEQ ID NO:743) (SEQ ID NO:744) 4.25 5′-PO3-AGAGCCGTT 5′-PO3-CGG CTC TTC(SEQ ID NO:745) (SEQ ID NO:746) 4.26 5′-PO3-ACCTTGCTT 5′-PO3-GCA AGG TTC(SEQ ID NO:747) (SEQ ID NO:748) 4.27 5′-PO3-AAGTCCGTT 5′-PO3-CGG ACT TTC(SEQ ID NO:749) (SEQ ID NO:750) 4.28 5′-PO3-GGACTGGTT 5′-PO3-CCA GTC CTC(SEQ ID NO:751) (SEQ ID NO:752) 4.29 5′-PO3-GTCGTTCTT 5′-PO3-GAA CGA CTC(SEQ ID NO:753) (SEQ ID NO:754) 4.30 5′-PO3-CAGCATCTT 5′-PO3-GAT GCT GTC(SEQ ID NO:755) (SEQ ID NO:756) 4.31 5′-PO3-CTATCCGTT 5′-PO3-CGG ATA GTC(SEQ ID NO:757) (SEQ ID NO:758) 4.32 5′-PO3-ACACTCGTT 5′-PO3-CGA GTG TTC(SEQ ID NO:759) (SEQ ID NO:760) 4.33 5′-PO3-ATCCAGGTT 5′-PO3-CCT GGA TTC(SEQ ID NO:761) (SEQ ID NO:762) 4.34 5′-PO3-GTTCCTGTT 5′-PO3-CAG GAA CTC(SEQ ID NO:763) (SEQ ID NO:764) 4.35 5′-PO3-ACACTCCTT 5′-PO3-GGA GTG TTC(SEQ ID NO:765) (SEQ ID NO:766) 4.36 5′-PO3-GTTCCTCTT 5′-PO3-GAG GAA CTC(SEQ ID NO:767) (SEQ ID NO:768) 4.37 5′-PO3-CTGGCTCTT 5′-PO3-GAG CCA GTC(SEQ ID NO:769) (SEQ ID NO:770) 4.38 5′-PO3-ACGGCATTT 5′-PO3-ATG CCG TTC(SEQ ID NO:771) (SEQ ID NO:772) 4.39 5′-PO3-GGTGAGGTT 5′-PO3-CCT CAC CTC(SEQ ID NO:773) (SEQ ID NO:774) 4.40 5′-PO3-CCTTCCGTT 5′-PO3-CGG AAG GTC(SEQ ID NO:775) (SEQ ID NO:776) 4.41 5′-PO3-TACGCTCTT 5′-PO3-GAG CGT ATC(SEQ ID NO:777) (SEQ ID NO:778) 4.42 5′-PO3-ACGGCAGTT 5′-PO3-CTG CCG TTC(SEQ ID NO:779) (SEQ ID NO:780) 4.43 5′-PO3-ACTGACGTT 5′-PO3-CGT CAG TTC(SEQ ID NO:781) (SEQ ID NO:782) 4.44 5′-PO3-ACGGCACTT 5′-PO3-GTG CCG TTC(SEQ ID NO:783) (SEQ ID NO:784) 4.45 5′-PO3-ACTGACCTT 5′-PO3-GGT CAG TTC(SEQ ID NO:785) (SEQ ID NO:786) 4.46 5′-PO3-TTTGCGGTT 5′-PO3-CCG CAA ATC(SEQ ID NO:787) (SEQ ID NO:788) 4.47 5′-PO3-TGGTAGGTT 5′-PO3-CCT ACC ATC(SEQ ID NO:789) (SEQ ID NO:790) 4.48 5′-PO3-GTTCGGCTT 5′-PO3-GCC GAA CTC(SEQ ID NO:791) (SEQ ID NO:792) 4.49 5′-PO3-GCC GTT CTT 5′-PO3-GAA CGGCTC (SEQ ID NO:793) (SEQ ID NO:794) 4.50 5′-PO3-GGAGAGGTT 5′-PO3-CCT CTCCTC (SEQ ID NO:795) (SEQ ID NO:796) 4.51 5′-PO3-CACTGACTT 5′-PO3-GTC AGTGTC (SEQ ID NO:797) (SEQ ID NO:798) 4.52 5′-PO3-CGTGCTCTT 5′-PO3-GAG CACGTC (SEQ ID NO:799) (SEQ ID NO:800) 4.53 5′-PO3-AATCCGCTT5′-PO3-GCGGATTTC (SEQ ID NO:801) (SEQ ID NO:802) 4.54 5′-PO3-AGGCTGGTT5′-PO3-CCA GCC TTC (SEQ ID NO:803) (SEQ ID NO:804) 4.55 5′-PO3-GCTAGTGTT5′-PO3-CAC TAG CTC (SEQ ID NO:805) (SEQ ID NO:806) 4.56 5′-PO3-GGAGAGCTT5′-PO3-GCT CTC CTC (SEQ ID NO:807) (SEQ ID NO:808) 4.57 5′-PO3-GGAGAGATT5′-PO3-TCT CTC CTC (SEQ ID NO:809) (SEQ ID NO:810) 4.58 5′-PO3-AGGCTGCTT5′-PO3-GCA GCC TTC (SEQ ID NO:811) (SEQ ID NO:812) 4.59 5′-PO3-GAGTGCGTT5′-PO3-CGC ACT CTC (SEQ ID NO:813) (SEQ ID NO:814) 4.60 5′-PO3-CCATCCATT5′-PO3-TGG ATG GTC (SEQ ID NO:815) (SEQ ID NO:816) 4.61 5′-PO3-GCTAGTCTT5′-PO3-GAC TAG CTC (SEQ ID NO:817) (SEQ ID NO:818) 4.62 5′-PO3-AGGCTGATT5′-PO3-TCA GCC TTC (SEQ ID NO:819) (SEQ ID NO:820) 4.63 5′-PO3-ACAGACGTT5′-PO3-CGT CTG TTC (SEQ ID NO:821) (SEQ ID NO:822) 4.64 5′-PO3-GAGTGCCTT5′-PO3-GGC ACT CTC (SEQ ID NO:823) (SEQ ID NO:824) 4.65 5′-PO3-ACAGACCTT5′-PO3-GGT CTG TTC (SEQ ID NO:825) (SEQ ID NO:826) 4.66 5′-PO3-CGAGCTTTT5′-PO3-AAG CTC GTC (SEQ ID NO:827) (SEQ ID NO:828) 4.67 5′-PO3-TTAGCGGTT5′-PO3-CCG CTA ATC (SEQ ID NO:829) (SEQ ID NO:830) 4.68 5′-PO3-CCTCTTGTT5′-PO3-CAA GAG GTC (SEQ ID NO:831) (SEQ ID NO:832) 4.69 5′-PO3-GGTCTCTTT5′-PO3-AGA GAC CTC (SEQ ID NO:833) (SEQ ID NO:834) 4.70 5′-PO3-GCCAGATTT5′-PO3-ATC TGG CTC (SEQ ID NO:835) (SEQ ID NO:836) 4.71 5′-PO3-GAGACCTTT5′-PO3-AGG TCT CTC (SEQ ID NO:837) (SEQ ID NO:838) 4.72 5′-PO3-CACACAGTT5′-PO3-CTG TGT GTC (SEQ ID NO:839) (SEQ ID NO:840) 4.73 5′-PO3-CCTCTTCTT5′-PO3-GAA GAG GTC (SEQ ID NO:841) (SEQ ID NO:842) 4.74 5′-PO3-TAGAGCGTT5′-PO3-CGC TCT ATC (SEQ ID NO:843) (SEQ ID NO:844) 4.75 5′-PO3-GCACCTTTT5′-PO3-AAG GTG CTC (SEQ ID NO:845) (SEQ ID NO:846) 4.76 5′-PO3-GGCTTGTTT5′-PO3-ACA AGC CTC (SEQ ID NO:847) (SEQ ID NO:848) 4.77 5′-PO3-GACGCGATT5′-PO3-TCG CGT CTC (SEQ ID NO:849) (SEQ ID NO:850) 4.78 5′-PO3-CGAGCTGTT5′-PO3-CAG CTC GTC (SEQ ID NO:851) (SEQ ID NO:852) 4.79 5′-PO3-TAGAGCCTT5′-PO3-GGC TCT ATC (SEQ ID NO:853) (SEQ ID NO:854) 4.80 5′-PO3-CATCCGTTT5′-PO3-ACG GAT GTC (SEQ ID NO:855) (SEQ ID NO:856) 4.81 5′-PO3-GGTCTCGTT5′-PO3-CGA GAC CTC (SEQ ID NO:857) (SEQ ID NO:858) 4.82 5′-PO3-GCCAGAGTT5′-PO3-CTC TGG CTC (SEQ ID NO:859) (SEQ ID NO:860) 4.83 5′-PO3-GAGACCGTT5′-PO3-CGG TCT CTC (SEQ ID NO:861) (SEQ ID NO:862) 4.84 5′-PO3-CGAGCTATT5′-PO3-TAG CTC GTC (SEQ ID NO:863) (SEQ ID NO:864) 4.85 5′-PO3-GCAAGTGTT5′-PO3-CAC TTG CTC (SEQ ID NO:365) (SEQ ID NO:866) 4.86 5′-PO3-GGTCTCCTT5′-PO3-GGA GAC CTC (SEQ ID NO:867) (SEQ ID NO:868) 4.87 5′-PO3-GCCAGACTT5′-PO3-GTC TGG CTC (SEQ ID NO:869) (SEQ ID NO:870) 4.88 5′-PO3-GGTCTCATT5′-PO3-TGA GAC CTC (SEQ ID NO:871) (SEQ ID NO:872) 4.89 5′-PO3-GAGACCATT5′-PO3-TGG TCT CTC (SEQ ID NO:873) (SEQ ID NO:874) 4.90 5′-PO3-CCTTCAGTT5′-PO3-CTG AAG GTC (SEQ ID NO:875) (SEQ ID NO:876) 4.91 5′-PO3-GCACCTGTT5′-PO3-CAG GTG CTC (SEQ ID NO:877) (SEQ ID NO:878) 4.92 5′-PO3-AAAGGCGTT5′-PO3-CGC CTT TTC (SEQ ID NO:879) (SEQ ID NO:880) 4.93 5′-PO3-CAGATCGTT5′-PO3-CGA TCT GTC (SEQ ID NO:881) (SEQ ID NO:882) 4.94 5′-PO3-CATAGGCTT5′-PO3-GCC TAT GTC (SEQ ID NO:883) (SEQ ID NO:884) 4.95 5′-PO3-CCTTCACTT5′-PO3-GTG AAG GTC (SEQ ID NO:885) (SEQ ID NO:886) 4.96 5′-PO3-GCACCTCTT5′-PO3-GAG GTG CTC (SEQ ID NO:887) (SEQ ID NO:888)

TABLE 7 Correspondence between building blocks and oligonucleotide tagsfor Cycles 1-4. Building block Cycle 1 Cycle 2 Cycle 3 Cycle 4 BB1 1.12.1 3.1 4.1 BB2 1.2 2.2 3.2 4.2 BB3 1.3 2.3 3.3 4.3 BB4 1.4 2.4 3.4 4.4BB5 1.5 2.5 3.5 4.5 BB6 1.6 2.6 3.6 4.6 BB7 1.7 2.7 3.7 4.7 BB8 1.8 2.83.8 4.8 BB9 1.9 2.9 3.9 4.9 BB10 1.10 2.10 3.10 4.10 BB11 1.11 2.11 3.114.11 BB12 1.12 2.12 3.12 4.12 BB13 1.13 2.13 3.13 4.13 BB14 1.14 2.143.14 4.14 BB15 1.15 2.15 3.15 4.15 BB16 1.16 2.16 3.16 4.16 BB17 1.172.17 3.17 4.17 BB18 1.18 2.18 3.18 4.18 BB19 1.19 2.19 3.19 4.19 BB201.20 2.20 3.20 4.20 BB21 1.21 2.21 3.21 4.21 BB22 1.22 2.22 3.22 4.22BB23 1.23 2.23 3.23 4.23 BB24 1.24 2.24 3.24 4.24 BB25 1.25 2.25 3.254.25 BB26 1.26 2.26 3.26 4.26 BB27 1.27 2.27 3.27 4.27 BB28 1.28 2.283.28 4.28 BB29 1.29 2.29 3.29 4.29 BB30 1.30 2.30 3.30 4.30 BB31 1.312.31 3.31 4.31 BB32 1.32 2.32 3.32 4.32 BB33 1.33 2.33 3.33 4.33 BB341.34 2.34 3.34 4.34 BB35 1.35 2.35 3.35 4.35 BB36 1.36 2.36 3.36 4.36BB37 1.37 2.37 3.37 4.37 BB38 1.38 2.38 3.38 4.38 BB39 1.39 2.39 3.394.39 BB40 1.44 2.44 3.44 4.44 BB41 1.41 2.41 3.41 4.41 BB42 1.42 2.423.42 4.42 BB43 1.43 2.43 3.43 4.43 BB44 1.40 2.40 3.40 4.40 BB45 1.452.45 3.45 4.45 BB46 1.46 2.46 3.46 4.46 BB47 1.47 2.47 3.47 4.47 BB481.48 2.48 3.48 4.48 BB49 1.49 2.49 3.49 4.49 BB50 1.50 2.50 3.50 4.50BB51 1.51 2.51 3.51 4.51 BB52 1.52 2.52 3.52 4.52 BB53 1.53 2.53 3.534.53 BB54 1.54 2.54 3.54 4.54 BB55 1.55 2.55 3.55 4.55 BB56 1.56 2.563.56 4.56 BB57 1.57 2.57 3.57 4.57 BB58 1.58 2.58 3.58 4.58 BB59 1.592.59 3.59 4.59 BB60 1.60 2.60 3.60 4.60 BB61 1.61 2.61 3.61 4.61 BB621.62 2.62 3.62 4.62 BB63 1.63 2.63 3.63 4.63 BB64 1.64 2.64 3.64 4.64BB65 1.65 2.65 3.65 4.65 BB66 1.66 2.66 3.66 4.66 BB67 1.67 2.67 3.674.67 BB68 1.68 2.68 3.68 4.68 BB69 1.69 2.69 3.69 4.69 BB70 1.70 2.703.70 4.70 BB71 1.71 2.71 3.71 4.71 BB72 1.72 2.72 3.72 4.72 BB73 1.732.73 3.73 4.73 BB74 1.74 2.74 3.74 4.74 BB75 1.75 2.75 3.75 4.75 BB761.76 2.76 3.76 4.76 BB77 1.77 2.77 3.77 4.77 BB78 1.78 2.78 3.78 4.78BB79 1.79 2.79 3.79 4.79 BB80 1.80 2.80 3.80 4.80 BB81 1.81 2.81 3.814.81 BB82 1.82 2.82 3.82 4.82 BB83 1.96 2.96 3.96 4.96 BB84 1.83 2.833.83 4.83 BB85 1.84 2.84 3.84 4.84 BB86 1.85 2.85 3.85 4.85 BB87 1.862.86 3.86 4.86 BB88 1.87 2.87 3.87 4.87 BB89 1.88 2.88 3.88 4.88 BB901.89 2.89 3.89 4.89 BB91 1.90 2.90 3.90 4.90 BB92 1.91 2.91 3.91 4.91BB93 1.92 2.92 3.92 4.92 BB94 1.93 2.93 3.93 4.93 BB95 1.94 2.94 3.944.94 BB96 1.95 2.95 3.95 4.951× ligase buffer: 50 mM Tris, pH 7.5; 10 mM dithiothreitol; 10 mM MgCl₂;2mM ATP; 50 mM NaCl.

10× ligase buffer: 500 mM Tris, pH 7.5; 100 mM dithiothreitol; 100 mMMgCl₂; 20 mM ATP; 500 mM NaCl

Attachment of Water Soluble Spacer to Compound 2

To a solution of Compound 2 (60 mL, 1 mM) in sodium borate buffer (150mM, pH 9.4) that was chilled to 4° C. was added 40 equivalents ofN-Fmoc-15-amino-4,7,10,13-tetraoxaoctadecanoic acid (S-Ado) inN,N-dimethylformamide (DMF) (16 mL, 0.15 M) followed by 40 equivalentsof 4-(4,6-dimethoxy[1.3.5]triazin-2-yl)-4-methylmorpholinium chloridehydrate (DMTMM) in water (9.6 mL, 0.25 M). The mixture was gently shakenfor 2 hours at 4° C. before an additional 40 equivalents of S-Ado andDMTMM were added and shaken for a further 16 hours at 4° C.

Following acylation, a 0.1× volume of 5 M aqueous NaCl and a 2.5× volumeof cold (−20° C.) ethanol was added and the mixture was allowed to standat −20° C. for at least one hour. The mixture was then centrifuged for15 minutes at 14,000 rpm in a 4° C. centrifuge to give a white pelletwhich was washed with cold EtOH and then dried in a lyophilizer at roomtemperature for 30 minutes. The solid was dissolved in 40 mL of waterand purified by Reverse Phase HPLC with a Waters Xterra RP₁₈ column. Abinary mobile phase gradient profile was used to elute the product usinga 50 mM aqueous triethylammonium acetate buffer at pH 7.5 and 99%acetontrile/1% water solution. The purified material was concentrated bylyophilization and the resulting residue was dissolved in 5 mL of water.A 0.1× volume of piperidine was added to the solution and the mixturewas gently shaken for 45 minutes at room temperature. The product wasthen purified by ethanol precipitation as described above and isolatedby centrifugation. The resulting pellet was washed twice with cold EtOHand dried by lyophilization to give purified Compound 3.

Cycle 1

To each well in a 96 well plate was added 12.5 μL of a 4 mM solution ofCompound 3 in water; 100 μL of a 1 mM solution of one of oligonucleotidetags 1.1 to 1.96, as shown in Table 3 (the molar ratio of Compound 3 totags was 1:2). The plates were heated to 95° C. for 1 minute and thencooled to 16° C. over 10 minutes. To each well was added 10 μL of 10×ligase buffer, 30 units T4 DNA ligase (1 μL of a 30 unit/μL solution(FermentasLife Science, Cat. No. EL0013)), 76.5 μl of water and theresulting solutions were incubated at 16° C. for 16 hours.

After the ligation reaction, 20 μL of 5 M aqueous NaCl was addeddirectly to each well, followed by 500 μL cold (−20° C.) ethanol, andheld at −20° C. for 1 hour. The plates were centrifugated for 1 hour at3200 g in a Beckman Coulter Allegra 6R centrifuge using BeckmanMicroplus Carriers. The supernatant was carefully removed by invertingthe plate and the pellet was washed with 70% aqueous cold ethanol at−20° C. Each of the pellets was then dissolved in sodium borate buffer(50 μL, 150 mM, pH 9.4) to a concentration of 1 mM and chilled to 4° C.

To each solution was added 40 equivalents of one of the 96 buildingblock precursors in DMF (13 μL, 0.15 M) followed by 40 equivalents ofDMT-MM in water (8 μL, 0.25M), and the solutions were gently shaken at4° C. After 2 hours, an additional 40 equivalents of one of eachbuilding block precursor and DMTMM were added and the solutions weregently shaken for 16 hours at 4° C. Following acylation, 10 equivalentsof acetic acid-N-hydroxy-succinimide ester in DMF (2 μL, 0.25M) wasadded to each solution and gently shaken for 10 minutes.

Following acylation, the 96 reaction mixtures were pooled and 0.1 volumeof 5M aqueous NaCl and 2.5 volumes of cold absolute ethanol were addedand the solution was allowed to stand at −20° C. for at least one hour.The mixture was then centrifuged. Following centrifugation, as muchsupernatant as possible was removed with a micropipette, the pellet waswashed with cold ethanol and centrifuged again. The supernatant wasremoved with a 200 μL pipet. Cold 70% ethanol was added to the tube, andthe resulting mixture was centrifuged for 5 min at 4° C.

The supernatant was removed and the remaining ethanol was removed bylyophilization at room temperature for 10 minutes. The pellet was thendissolved in 2 mL of water and purified by Reverse Phase HPLC with aWaters Xterra RP₁₈ column. A binary mobile phase gradient profile wasused to elute the library using a 50 mM aqueous triethylammonium acetatebuffer at pH 7.5 and 99% acetontrile/1% water solution. The fractionscontaining the library were collected, pooled, and lyophilized. Theresulting residue was dissolved in 2.5 mL of water and 250 μL ofpiperidine was added. The solution was shaken gently for 45 minutes andthen precipitated with ethanol as previously described. The resultingpellet was dried by lyophilization and then dissolved in sodium boratebuffer (4.8 mL, 150 mM, pH 9.4) to a concentration of 1 mM.

The solution was chilled to 4° C. and 40 equivalents each ofN-Fmoc-propargylglycine in DMF (1.2 mL, 0.15 M) and DMT-MM in water (7.7mL, 0.25 M) were added. The mixture was gently shaken for 2 hours at 4°C. before an additional 40 equivalents of N-Fmoc-propargylglycine andDMT-MM were added and the solution was shaken for a further 16 hours.The mixture was later purified by EtOH precipitation and Reverse PhaseHPLC as described above and the N-Fmoc group was removed by treatmentwith piperidine as previously described. Upon final purification by EtOHprecipitation, the resulting pellet was dried by lyophilization andcarried into the next cycle of synthesis

Cycles 2-4

For each of these cycles, the dried pellet from the previous cycle wasdissolved in water and the concentration of library was determined byspectrophotometry based on the extinction coefficient of the DNAcomponent of the library, where the initial extinction coefficient ofCompound 2 is 131,500 L/(mole.cm). The concentration of the library wasadjusted with water such that the final concentration in the subsequentligation reactions was 0.25 mM. The library was then divided into 96equal aliquots in a 96 well plate. To each well was added a solutioncomprising a different tag (molar ratio of the library to tag was 1:2),and ligations were performed as described for Cycle 1. Oligonucleotidetags used in Cycles 2, 3 and 4 are set forth in Tables 4, 5 and 6,respectively. Correspondense between the tags and the building blockprecursors for each of Cycles 1 to 4 is provided in Table 7. The librarywas precipitated by the addition of ethanol as described above for Cycle1, and dissolved in sodium borate buffer (150 mM, pH 9.4) to aconcentration of 1 mM. Subsequent acylations and purifications wereperformed as described for Cycle 1, except HPLC purification was omittedduring Cycle 3.

The products of Cycle 4 were ligated with the closing primer shownbelow, using the method described above for ligation of tags. (SEQ IDNO:889) 5′-PO₃-CAG AAG ACA GAC AAG CTT CAC CTG C (SEQ ID NO:890)5′-PO₃-GCA GGT GAA GCT TGT CTG TCT TCT GAAResults:

The synthetic procedure described above has the capability of producinga library comprising 96⁴ (about 10⁸) different structures. The synthesisof the library was monitored via gel electrophoresis and LC/MS of theproduct of each cycle. Upon completion, the library was analyzed usingseveral techniques. FIG. 13 a is a chromatogram of the library followingCycle 4, but before ligation of the closing primer; FIG. 13 b is a massspectrum of the library at the same synthetic stage. The averagemolecular weight was determined by negative ion LC/MS analysis. The ionsignal was deconvoluted using ProMass software. This result isconsistent with the predicted average mass of the library.

The DNA component of the library was analyzed by agarose gelelectrophoresis, which showed that the majority of library materialcorresponds to ligated product of the correct size. DNA sequenceanalysis of molecular clones of PCR product derived from a sampling ofthe library shows that DNA ligation occurred with high fidelity and tonear completion.

Library Cyclization

At the completion of Cycle 4, a portion of the library was capped at theN-terminus using azidoacetic acid under the usual acylation conditions.The product, after purification by EtOH precipitation, was dissolved insodium phosphate buffer (150 mM, pH 8) to a concentration of 1 mM and 4equivalents each of CuSO₄ in water (200 mM), ascorbic acid in water (200mM), and a solution of the compound shown below in DMF (200 mM) wereadded. The reaction mixture was then gently shaken for 2 hours at roomtemperature.

To assay the extent of cyclization, 5 μL aliquots from the librarycyclization reaction were removed and treated with afluorescently-labeled azide or alkyne (1 μL of 100 mM DMF stocks)prepared as described in Example 4. After 16 hours, neither the alkyneor azide labels had been incorporated into the library by HPLC analysisat 500 nm. This result indicated that the library no longer containedazide or alkyne groups capable of cycloaddition and that the librarymust therefore have reacted with itself, either through cyclization orintermolecular reactions. The cyclized library was purified by ReversePhase HPLC as previously described. Control experiments using uncyclizedlibrary showed complete incorporation of the fluorescent tags mentionedabove.

Example 4 Preparation of Fluorescent Tags for Cyclization Assay

In separate tubes, propargyl glycine or 2-amino-3-phenylpropylazide (8μmol each) was combined with FAM-OSu (Molecular Probes Inc.) (1.2equiv.) in pH 9.4 borate buffer (250 μL). The reactions were allowed toproceed for 3 h at room temperature, and were then lyophilizedovernight. Purification by HPLC afforded the desired fluorescent alkyneand azide in quantitative yield.

Example 5 Cyclization of Individual Compounds using the azide/alkyneCycloaddition Reaction

Preparation of Azidoacetyl-Gly-Pro-Phe-Pra-NH₂:

Using 0.3 mmol of Rink-amide resin, the indicated sequence wassynthesized using standard solid phase synthesis techniques withFmoc-protected amino acids and HATU as activating agent(Pra=C-propargylglycine). Azidoacetic acid was used to cap thetetrapeptide. The peptide was cleaved from the resin with 20% TFA/DCMfor 4 h. Purification by RP HPLC afforded product as a white solid (75mg, 51%). ¹H NMR (DMSO-d₆, 400 MHz): 8.4-7.8 (m, 3H), 7.4-7.1 (m, 7 H),4.6-4.4 (m, 1H), 4.4-4.2 (m, 2H), 4.0-3.9 (m, 2H), 3.74 (dd, 1H, J=6 Hz,17 Hz), 3.5-3.3 (m, 2H), 3.07 (dt, 1H, J=5 Hz, 14 Hz), 2.92 (dd, 1H, J=5Hz, 16 Hz), 2.86 (t, 1H, J=2 Hz), 2.85-2.75 (m, 1H), 2.6-2.4 (m, 2H),2.2-1.6 (m, 4H). IR (mull) 2900, 2100, 1450, 1300 cm⁻¹. ESIMS 497.4([M+H], 100%), 993.4 ([2M+H], 50%). ESIMS with ion-source fragmentation:519.3 ([M+Na], 100%), 491.3 (100%), 480.1 ([M-NH₂], 90%), 452.2([M-NH₂—CO], 20%), 424.2 (20%), 385.1 ([M-Pra], 50%), 357.1 ([M-Pra-CO],40%), 238.0 ([M-Pra-Phe], 100%).Cyclization of Azidoacetyl-Gly-Pro-Phe-Pra-NH₂:

The azidoacetyl peptide (31 mg, 0.62 mmol) was dissolved in MeCN (30mL). Diisopropylethylamine (DIEA, 1 mL) and Cu(MeCN)₄PF₆ (1 mg) wereadded. After stirring for 1.5 h, the solution was evaporated and theresulting residue was taken up in 20% MeCN/H₂O. After centrifugation toremove insoluble salts, the solution was subjected to preparativereverse phase HPLC. The desired cyclic peptide was isolated as a whitesolid (10 mg, 32%). ¹H NMR (DMSO-d₆, 400 MHz): 8.28 (t, 1H, J=5 Hz),7.77 (s, 1H), 7.2-6.9 (m, 9H), 4.98 (m, 2H), 4.48 (m, 1H), 4.28 (m, 1H),4.1-3.9 (m, 2H), 3.63 (dd, 1H, J=5 Hz, 16 Hz), 3.33 (m, 2H), 3.0 (m,3H), 2.48 (dd, 1H, J=11 Hz, 14 Hz), 1.75 (m, 1H0, 1.55 (m, 1H), 1.32 (m,1H), 1.05 (m, 1H). IR (mull) 2900, 1475, 1400 cm⁻¹. ESIMS 497.2 ([M+H],100%), 993.2 ([2M+H], 30%), 1015.2 ([2M+Na], 15%). ESIMS with ion-sourcefragmentation: 535.2 (70%), 519.3 ([M+Na], 100%), 497.2 ([M+H], 80%),480.1 ([M-NH₂], 30%), 452.2 ([M-NH₂—CO], 40%), 208.1 (60%).

Preparation of Azidoacetyl-Gly-Pro-Phe-Pra-Gly-OH:

Using 0.3 mmol of Glycine-Wang resin, the indicated sequence wassynthesized using Fmoc-protected amino acids and HATU as the activatingagent. Azidoacetic acid was used in the last coupling step to cap thepentapeptide. Cleavage of the peptide was achieved using 50% TFA/DCM for2 h. Purification by RP HPLC afforded the peptide as a white solid (83mg; 50%). ¹H NMR (DMSO-d₆, 400 MHz): 8.4-7.9 (m, 4H), 7.2 (m, 5H),4.7-4.2 (m, 3H), 4.0-3.7 (m, 4H), 3.5-3.3 (m, 2H), 3.1 (m, 1H), 2.91(dd, 1H, J=4 Hz, 16 Hz), 2.84 (t, 1H, J=2.5 Hz), 2.78 (m, 1H), 2.6-2.4(m, 2H), 2.2-1.6 (m, 4H). IR (mull) 2900, 2100, 1450, 1350 cm⁻¹. ESIMS555.3 ([M+H], 100%). ESIMS with ion-source fragmentation: 577.1 ([M+Na],90%), 555.3 ([M+H], 80%), 480.1 ([M-Gly], 100%), 385.1 ([M-Gly-Pra],70%), 357.1 ([M-Gly-Pra-CO], 40%), 238.0 ([M-Gly-Pra-Phe], 80%).

Cyclization of Azidoacetyl-Gly-Pro-Phe-Pra-Gly-OH:

The peptide (32 mg, 0.058 mmol) was dissolved in MeCN (60 mL).Diisopropylethylamine (1 mL) and Cu(MeCN)₄PF₆ (1 mg) were added and thesolution was stirred for 2 h. The solvent was evaporated and the crudeproduct was subjected to RP HPLC to remove dimers and trimers. Thecyclic monomer was isolated as a colorless glass (6 mg, 20%). ESIMS555.6 ([M+H], 100%), 1109.3 ([2M+H], 20%), 1131.2 ([2M+Na], 15%).

ESIMS with ion source fragmentation: 555.3 ([M+H], 100%), 480.4([M-Gly], 30%), 452.2 ([M-Gly-CO], 25%), 424.5 ([M-Gly-2CO], 10%, onlypossible in a cyclic structure).

Conjugation of Linear Peptide to DNA:

Compound 2 (45 nmol) was dissolved in 45 μL sodium borate buffer (pH9.4; 150 mM). At 4° C., linear peptide (18 μL of a 100 mM stock in DMF;180 nmol; 40 equiv.) was added, followed by DMT-MM (3.6 μL of a 500 mMstock in water; 180 nmol; 40 equiv.). After agitating for 2 h, LCMSshowed complete reaction, and product was isolated by ethanolprecipitation. ESIMS 1823.0 ([M-3H]/3, 20%), 1367.2 ([M-4H]/4, 20%),1093.7 ([M-5H]/5, 40%), 911.4 ([M-6H]/6, 100%).

Conjugation of Cyclic Peptide to DNA:

Compound 2 (20 nmol) was dissolved in 20 μL sodium borate buffer (pH9.4, 150 mM). At 4° C., linear peptide (8 μL of a 100 mM stock in DMF;80 nmol; 40 equiv.) was added, followed by DMT-MM (1.6 μL of a 500 mMstock in water; 80 nmol; 40 equiv.). After agitating for 2 h, LCMSshowed complete reaction, and product was isolated by ethanolprecipitation. ESIMS 1823.0 ([M-3H]/3, 20%), 1367.2 ([M-4H]/4, 20%),1093.7 ([M-5H]/5, 40%), 911.4 ([M-6H]/6, 100%).

Cyclization of DNA-Linked Peptide:

Linear peptide-DNA conjugate (10 nmol) was dissolved in pH 8 sodiumphosphate buffer (10 μL, 150mm). At room temperature, 4 equivalents eachof CuSO₄, ascorbic acid, and the Sharpless ligand were all added (0.2 μLof 200 mM stocks). The reaction was allowed to proceed overnight. RPHPLC showed that no linear peptide-DNA was present, and that the productco-eluted with authentic cyclic peptide-DNA. No traces of dimers orother oligomers were observed.

elutes @4.48 min. elutes @4.27 min.

-   -   LC conditions: Targa C18, 2.1×40 mm, 1040%    -   MeCN in 40 mM aq. TEAA over 8 min.

Example 6 Application of Aromatic Nucleophilc Substitution Reactions toFunctional Moiety Synthesis

General Procedure for Arylation of Compound 3 with Cyanuric Chloride:

Compound 2 is dissolved in pH 9.4 sodium borate buffer at aconcentration of 1 mM. The solution is cooled to 40° C. and 20equivalents of cyanuric chloride is then added as a 500 mM solution inMeCN. After 2 h, complete reaction is confirmed by LCMS and theresulting dichlorotriazine-DNA conjugate is isolated by ethanolprecipitation.

Procedure for Amine Substitution of Dichlorotriazine-DNA:

The dichlorotriazine-DNA conjugate is dissolved in pH 9.5 borate bufferat a concentration of 1 mM. At room temperature, 40 equivalents of analiphatic amine is added as a DMF solution. The reaction is followed byLCMS and is usually complete after 2 h. The resultingalkylamino-monochlorotriazine-DNA conjugate is isolated by ethanolprecipitation.

Procedure for Amine Substitution of Monochlorotriazine-DNA:

The alkylamino-monochlorotriazine-DNA conjugate is dissolved in pH 9.5borate buffer at a concentration of 1 mM. At 42° C., 40 equivalents of asecond aliphatic amine is added as a DMF solution. The reaction isfollowed by LCMS and is usually complete after 2 h. The resultingdiaminotriazine-DNA conjugate is isolated by ethanol precipitation.

Example 7 Application of Reductive Amination Reactions to FunctionalMoiety Synthesis

General Procedure for Reductive Amination of DNA-Linker Containing aSecondary Amine with an Aldehyde Building Block:

Compound 2 was coupled to an N-terminal proline residue. The resultingcompound was dissolved in sodium phosphate buffer (50 μL, 150 mM, pH5.5) at a concentration of 1 mM. To this solution was added 40equivalents each of an aldehyde building block in DMF (8 μL, 0.25M) andsodium cyanoborohydride in DMF (8 μL, 0.25M) and the solution was heatedat 80° C. for 2 hours. Following alkylation, the solution was purifiedby ethanol precipitation.

General Procedure for Reductive Aminations of DNA-Linker Containing anAldehyde with Amine Building Blocks:

Compound 2 coupled to a building block comprising an aldehyde group wasdissolved in sodium phosphate buffer (50 μL, 250 mM, pH 5.5) at aconcentration of 1 mM. To this solution was added 40 equivalents each ofan amine building block in DMF (8 μL, 0.25M) and sodium cyanoborohydridein DMF (8 μL, 0.25M) and the solution was heated at 80° C. for 2 hours.Following alkylation, the solution was purified by ethanolprecipitation.

Example 8 Application of Peptoid Building Reactions to Functional MoietySynthesis

General Procedure for Peptoid Synthesis on DNA-Linker:

Compound 2 was dissolved in sodium borate buffer (50 μL, 150 mM, pH 9.4)at a concentration of 1 mM and chilled to 4° C. To this solution wasadded 40 equivalents of N-hydroxysuccinimidyl bromoacetate in DMF (13μL, 0.15 M) and the solution was gently shaken at 4° C. for 2 hours.Following acylation, the DNA-Linker was purified by ethanolprecipitation and redissolved in sodium borate buffer (50 μL, 150 mM, pH9.4) at a concentration of 1 mM and chilled to 4° C. To this solutionwas added 40 eqivalents of an amine building block in DMF (13 μL, 0.15M) and the solution was gently shaken at 4° C. for 16 hours. Followingalkylation, the DNA-linker was purified by ethanol precipitation andredissolved in sodium borate buffer (50 μL, 150 mM, pH 9.4) at aconcentration of 1 mM and chilled to 4° C. Peptoid synthesis iscontinued by the stepwise addition of N-hydroxysuccinimidyl bromoacetatefollowed by the addition of an amine building block.

Example 9 Application of the Azide-Alkyne Cycloaddition Reaction toFunctional Moiety Synthesis

General Procedure

An alkyne-containing DNA conjugate is dissolved in pH 8.0 phosphatebuffer at a concentration of ca. 1 mM. To this mixture is added 10equivalents of an organic azide and 5 equivalents each of copper (II)sulfate, ascorbic acid, and the ligand(tris-((1-benzyltriazol-4-yl)methyl)amine all at room temperature. Thereaction is followed by LCMS, and is usually complete after 1-2 h. Theresulting triazole-DNA conjugate can be isolated by ethanolprecipitation.

Example 10 Identification of a ligand to Abl Kinase from within anEncoded Library

The ability to enrich molecules of interest in a DNA-encoded libraryabove undesirable library members is paramount to identifying singlecompounds with defined properties against therapeutic targets ofinterest. To demonstrate this enrichment ability a known bindingmolecule (described by Shah et al., Science 305, 399-401 (2004),incorporated herein by reference) to rhAbl kinase (GenBank U07563) wassynthesized. This compound was attached to a double stranded DNAoligonucleotide via the linker described in the preceding examples usingstandard chemistry methods to produce a molecule similar (functionalmoiety linked to an oligonucleotide) to those produced via the methodsdescribed in Examples 1 and 2. A library generally produced as describedin Example 2 and the DNA-linked Abl kinase binder were designed withunique DNA sequences that allowed qPCR analysis of both species. TheDNA-linked Abl kinase binder was mixed with the library at a ratio of1:1000. This mixture was equilibrated with to rhable kinase, and theenzyme was captured on a solid phase, washed to remove non-bindinglibrary members and binding molecules were eluted. The ratio of librarymolecules to the DNA-linked Abl kinase inhibitor in the eluate was 1:1,indicating a greater than 500-fold enrichment of the DNA-linkedAbl-kinase binder in a 1000-fold excess of library molecules.

Equivalents

Those skilled in the art will recognize, or be able to ascertain usingno more than routine experimentation, many equivalents to the specificembodiments of the invention described herein. Such equivalents areintended to be encompassed by the following claims.

1. A method for identifying one or more compounds which bind to abiological target, said method comprising: (A) synthesizing a library ofcompounds, wherein the compounds comprise a functional moiety comprisingtwo or more building blocks which is operatively linked to an initialoligonucleotide which identifies the structure of the functional moietyby: (i) providing a solution comprising m initiator compounds, wherein mis an integer of 1 or greater, where the initiator compounds consist ofa functional moiety comprising n building blocks, where n is an integerof 1 or greater, which is operatively linked to an initialoligonucleotide which identifies the n building blocks; (ii) dividingthe solution of step (i) into r reaction vessels, wherein r is aninteger of 2 or greater, thereby producing r aliquots of the solution;(iii) reacting the initiator compounds in each reaction vessel with oneof r building blocks, thereby producing r aliquots comprising compoundsconsisting of a functional moiety comprising n+1 building blocksoperatively linked to the initial oligonucleotide; and (iv) reacting theinitial oligonucleotide in each aliquot with one of a set of r distinctincoming oligonucleotides in the presence of an enzyme which catalyzesthe ligation of the incoming oligonucleotide and the initialoligonucleotide, under conditions suitable for enzymatic ligation of theincoming oligonucleotide and the initial oligonucleotide; therebyproducing r aliquots of molecules consisting of a functional moietycomprising n+1 building blocks operatively linked to an elongatedoligonucleotide which encodes the n+1 building blocks; (B) contactingthe biological target with the library of compounds, or a portionthereof, under conditions suitable for at least one member of thelibrary of compounds to bind to the target; (C) removing library membersthat do not bind to the target; (D) sequencing the encodingoligonucleotides of the at least one member of the library of compoundswhich binds to the target, and (E) using the sequences determined instep (D) to determine the structure of the functional moieties of themembers of the library of compounds which bind to the biological target,thereby identifying one or more compounds which bind to the biologicaltarget.
 2. The method of claim 1, further comprising amplifying theencoding oligonucleotides of the at least one member of the library ofcompounds which binds to the target.
 3. The method of claim 2, whereinsaid amplifying step comprises: (i) forming a water-in-oil emulsion tocreate a plurality of aqueous microreactors, wherein at least one of themicroreactors comprises the at least one member of the library ofcompounds that binds to the target, a single bead capable of binding tothe encoding oligonucleotide of the at least one member of the libraryof compounds that binds to the target, and amplification reactionsolution containing reagents necessary to perform nucleic acidamplification; (ii) amplifying the encoding oligonucleotide in themicroreactors to form amplified copies of said encoding oligonucleotide;and (iii) binding the amplified copies of the encoding oligonucleotideto the beads in the microreactors.
 4. The method of claim 1, whereinsaid sequencing step (D) comprises: (i) annealing an effective amount ofa sequencing primer to the amplified copies of the encodingoligonucleotide and extending the sequencing primer with a polymeraseand a predetermined nucleotide triphosphate to yield a sequencingproduct and, if the predetermined nucleotide triphosphate isincorporated onto a 3′ end of said sequencing primer, a sequencingreaction byproduct; and (ii) identifying the sequencing reactionbyproduct, thereby determining the sequence of the encodingoligonucleotide.
 5. A method for identifying one or more compounds whichbind to a biological target, said method comprising: (A) synthesizing alibrary of compounds, wherein the compounds comprise a functional moietycomprising two or more building blocks which is operatively linked to aninitial oligonucleotide which identifies the structure of the functionalmoiety by: (i) providing a solution comprising m initiator compounds,wherein m is an integer of 1 or greater, where the initiator compoundsconsist of a functional moiety comprising n building blocks, where n isan integer of 1 or greater, which is operatively linked to an initialoligonucleotide which identifies the n building blocks; (ii) dividingthe solution of step (i) into r reaction vessels, wherein r is aninteger of 2 or greater, thereby producing r aliquots of the solution;(iii) reacting the initiator compounds in each reaction vessel with oneof r building blocks, thereby producing r aliquots comprising compoundsconsisting of a functional moiety comprising n+1 building blocksoperatively linked to the initial oligonucleotide; and (iv) reacting theinitial oligonucleotide in each aliquot with one of a set of r distinctincoming oligonucleotides in the presence of an enzyme which catalyzesthe ligation of the incoming oligonucleotide and the initialoligonucleotide, under conditions suitable for enzymatic ligation of theincoming oligonucleotide and the initial oligonucleotide; therebyproducing r aliquots of molecules consisting of a functional moietycomprising n+1 building blocks operatively linked to an elongatedoligonucleotide which encodes the n+1 building blocks; (B) contactingthe biological target with the library of compounds, or a portionthereof, under conditions suitable for at least one member of thelibrary of compounds to bind to the target; (C) removing library membersthat do not bind to the target; (D) sequencing the encodingoligonucleotides of the at least one member of the library of compoundswhich binds to the target, wherein said sequencing comprises: (i)annealing an effective amount of a sequencing primer to the amplifiedcopies of the encoding oligonucleotide and extending the sequencingprimer with a polymerase and a predetermined nucleotide triphosphate toyield a sequencing product and, if the predetermined nucleotidetriphosphate is incorporated onto a 3′ end of said sequencing primer, asequencing reaction byproduct; and (ii) identifying the sequencingreaction byproduct, thereby determining the sequence of the encodingoligonucleotide; and (E) using the sequence of the encodingoligonucleotide determined in step (D) to determine the structure of thefunctional moieties of the members of the library of compounds whichbind to the biological target, thereby identifying one or more compoundswhich bind to the biological target.
 6. The method of claim 5, furthercomprising amplifying the encoding oligonucleotides of the at least onemember of the library of compounds which binds to the target.
 7. Themethod of claim 6, wherein said amplification of the encodingoligonucleotides is carried out by a method selected from the groupconsisting of: the polymerase chain reaction (PCR); transcription-basedamplification, rapid amplification of cDNA ends, continuous flowamplification, and rolling circle amplification.
 8. The method of anyone of claims 1, 4, or 5, wherein said sequencing of the encodingoligonucleotides is carried out by a pyrophosphate-based sequencingreaction or a single molecule sequencing by synthesis method.
 9. Themethod of claim 8, wherein the sequencing reaction byproduct is PPi anda coupled sulfurylase/luciferase reaction is used to generate light fordetection.
 10. The method of any one of claims 1 or 5, furthercomprising the step of enriching for beads which bind amplified copiesof the encoding oligonucleotide away from beads to which no encodingoligonucleotide is bound.
 11. The method of claim 10, wherein the methodfor said enrichment step is selected from the group consisting ofaffinity purification, and electrophoresis.
 12. The method of claim 3,further comprising breaking the emulsion to retrieve one or more of theamplified copies of the encoding oligonucleotide.
 13. The method ofclaim 1 or 5, further comprising the step of (A)(v) combining two ormore of the r aliquots, thereby producing a solution comprisingmolecules consisting of a functional moiety comprising n+1 buildingblocks, which is operatively linked to an elongated oligonucleotidewhich encodes the n+1 building blocks.
 14. The method of claim 13,wherein r aliquots are combined.
 15. The method of claim 13, wherein thesteps (A)(i) to (A)(v) are conducted one or more times to yield cycles 1to i, where i is an integer of 2 or greater, wherein in cycle s+1, wheres is an integer of i-1 or less, the solution comprising m initiatorcompounds of step (a) is the solution of step (e) of cycle s.
 16. Themethod of of claim 1 or 5, wherein at least one of building blocks is anamino acid.
 17. The method of claim 1 or 5, wherein the initialoligonucleotide is a covalently coupled double-stranded oligonucleotide.18. The method of claim 17, wherein the incoming oligonucleotide is adouble-stranded oligonucleotide.
 19. The method of claim 1 or 5, whereinthe initiator compounds comprise a linker moiety comprising a firstfunctional group adapted to bond with a building block, a secondfunctional group adapted to bond to the 5′ end of an oligonucleotide,and a third functional group adapted to bond to the 3′-end of anoligonucleotide.
 20. The method of claim 19, wherein the linker moietyis of the structure

wherein A is a functional group adapted to bond to a building block; Bis a functional group adapted to bond to the 5′-end of anoligonucleotide; C is a functional group adapted to bond to the 3′-endof an oligonucleotide; S is an atom or a scaffold; D is a chemicalstructure that connects A to S; E is a chemical structure that connectsB to S; and F is a chemical structure that connects C to S.
 21. Themethod of claim 20, wherein: A is an amino group; B is a phosphategroup; and C is a phosphate group.
 22. The method of claim 20, whereinD, E and F are each, independently, an alkylene group or anoligo(ethylene glycol) group.
 23. The method of claim 20, wherein S is acarbon atom, a nitrogen atom, a phosphorus atom, a boron atom, aphosphate group, a cyclic group or a polycyclic group.
 24. The method ofclaim 23, wherein the linker moiety is of the structure

wherein each of n, m and p is, independently, an integer from 1 to about20.
 25. The method of claim 24, wherein each of n, m and p isindependently an integer from 2 to eight.
 26. The method of claim 25,wherein each of n, m and p is independently an integer from 3 to
 6. 27.The method of claim 24, wherein the linker moiety has the structure


28. The method of claim 1 or 5, wherein each of said initiator compoundscomprises a reactive group and wherein each of said r building blockscomprises a complementary reactive group which is complementary to saidreactive group.
 29. The method of claim 28, wherein the reactive groupand the complementary reactive group are selected from the groupconsisting of an amino group; a carboxyl group; a sulfonyl group; aphosphonyl group; an epoxide group; an aziridine group; and anisocyanate group.
 30. The method of claim 28, wherein reactive group andthe the complementary reactive group are selected from the groupconsisting of a hydroxyl group; a carboxyl group; a sulfonyl group; aphosphonyl group; an epoxide group; an aziridine group; and anisocyanate group.
 31. The method of claim 28, wherein the reactive groupand the complementary reactive group are selected from the groupconsisting of an amino group and an aldehyde or ketone group.
 32. Themethod of claim 28, wherein the reaction between the reactive group andthe complementary reactive group is conducted under reducing conditions.33. The method of claim 28, wherein the reactive group and thecomplementary reactive group are selected from the group consisting of aphosphorous ylide group and an aldehyde or ketone group.
 34. The methodof claim 28, wherein the reactive group and the complementary reactivegroup react via cycloaddition to form a cyclic structure.
 35. The methodof claim 34, wherein the reactive group and the complementary reactivegroup are selected from the group consisting of an alkyne and an azide.36. The method of claim 28, wherein the reactive group and thecomplementary functional group are selected from the group consisting ofa halogenated heteroaromatic group and a nucleophile.
 37. The method ofclaim 36, wherein the halogenated heteroaromatic group is selected fromthe group consisting of chlorinated pyrimidines, chlorinated triazinesand chlorinated purines.
 38. The method of claim 36, wherein thenucleophile is an amino group.
 39. The method of claim 13, furthercomprising following cycle i, the step of: (A)(vi) cyclizing one or moreof the functional moieties.
 40. The method of claim 39, wherein afunctional moiety of step (A)(vi) comprises an azido group and analkynyl group.
 41. The method of claim 40, wherein the functional moietyis maintained under conditions suitable for cycloaddition of the azidogroup and the alkynyl group to form a triazole group, thereby forming acyclic functional moiety
 42. The method of claim 41, wherein thecycloaddition reaction is conducted in the presence of a coppercatalyst.
 43. The method of claim 42, wherein at least one of the one ormore functional moieties of step (f) comprises at least two sulfhydrylgroups, and said functional moiety is maintained under conditionssuitable for reaction of the two sulfhydryl groups to form a disulfidegroup, thereby cyclicizing the functional moiety.
 44. The method ofclaim 1 or 5, wherein the initial oligonucleotide comprises a PCR primersequence.
 45. The method of claim 13, wherein the incomingoligonucleotide of cycle i comprises a PCR closing primer.
 46. Themethod of claim 13, further comprising following cycle i, the step of (d) ligating an oligonucleotide comprising a closing PCR primer sequenceto the encoding oligonucleotide.
 47. The method of claim 46, wherein theoligonucleotide comprising a closing PCR primer sequence is ligated tothe encoding oligonucleotide in the presence of an enzyme whichcatalyzes said ligation.